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Abstract — In this paper an integrated Certificateless Public Key Infrastructure (CLPKI) that focuses on key 
management issues is proposed. The proposed scheme provides two-factor private key authentication to protect the 
private key in case of device theft or compromise. The private key in the proposed scheme is not stored in the 
device, but rather it is calculated every time the user needs it. It depends also on a user's chosen password and then 
even if the device is stolen, the attacker cannot get the private key because he/she does not know the user's secret 
password. The proposed model provides many other key management features like private key recovery, private key 
portability and private key archiving. 

2. Paper 31101324: An Attribute-Based Public Key Infrastructure (pp. 11-18) 

Hendri Nogueira, Jean Everson Martina and Ricardo Felipe Custodio 
Federal University of Santa Catarina - Florianopolis - SC, Brazi 

Abstract — While X.509 Public Key Infrastructures (PKIs) and X.509 Attribute Certificates (ACs) enforce strong 
authentication and authorization procedures (respectively), they do not give the user management over his/her own 
attributes. This is especially important in regards to the users' personal information when a service provider requests 
more than necessary, sensitive information such as medical data, and the users need control over the attributes they 
are sharing. We present an Attribute-Based Public Key Infrastructure that addresses the management of users' 
attributes and giving more control to the users' concerns in identity and access management system and in 
documents signatures. Our user-centric scheme also simplify the confidence of the attributes validity and the 
verification procedures. 

Index Terms — Attribute-Based, Public Key Infrastructure, Identity Management, Attributes, User-Centric. 

3. Paper 31101325: Map Visualization of Shortest Path Searching of Government Agency Location Using Ant 
Colony Algorithm (pp. 19-23) 

Candra Dewi and Devi Andriati, 

Program of Information Technology and Computer Science, Brawijaya University 

Abstract — The case of the shortest path searching is an issue to get the destination with the efficient time and the 
shortest path. Therefore, some shortest path searching system has been developed as a tool to get the destination 
without spent a lot of time. This paper implements the visualization of searching result for shortest path of the 
government agency location on the map using ant colony algorithm. Ant colony algorithm is an algorithm which has 
a probabilistic technique that is affected by ant pheromone. The shortest path searching considers some factors such 
as traffic jam, road direction, departures time and vehicle type. The testing is done to obtain the ant tracking 
intensity controlling constant (a) for calculation probability of route that is selected by ant and visibility controlling 
constant (P), therefore the optimal route would be obtained. The testing result shows that the worst accuracy value 
was reach when a = and P = 0. On the other hand, the accuracy value close to 100% on some combination of the 
parameter such as (a = 0, P = 1), (a = 2, P = 1), (a=0, P=2), (a=l, P= 2) to (a=2, p = 5). It shows that the accuracy 



value is close to the best result. The change of parameter a and P are the main priority on the shortest path searching 
because the values have been produced will be used as probability value of pheromone. 

Keywords - shortest path; map visualization; Ant Colony algorithm; government agency location 

4. Paper 31101328: Determination of Multipath Security Using Efficient Pattern Matching (pp. 24-33) 

James Obert, Cyber R&D Solutions, Sandia National Labs, Albuquerque, NM, USA 

Huiping Cao, Computer Science Department, New Mexico State University, Las Cruces, NM, USA 

Abstract — Multipath routing is the use of multiple potential paths through a network in order to enhance fault 
tolerance, optimize bandwidth use, and improve security. Selecting data flow paths based on cost addresses 
performance issues but ignores security threats. Attackers can disrupt the data flows by attacking the links along the 
paths. Denial-of-service, remote exploitation, and other such attacks launched on any single link can severely limit 
throughput. Networks can be secured using a secure quality of service approach in which a sender disperses data 
along multiple secure paths. In this secure multi-path approach, a portion of the data from the sender is transmitted 
over each path and the re-ceiver assembles the data fragments that arrive. One of the larg-est challenges in secure 
multipath routing is determining the se-curity threat level along each path and providing a commensu-rate level of 
encryption along that path. The research presented explores the effects of real-world attack scenarios in systems, and 
gauges the threat levels along each path. Optimal sampling and compression of network data is provided via 
compressed sensing. The probability of the presence of specific attack signatures along a network path is determined 
using machine learning techniques. Using these probabilities, information assurance levels are de-rived such that 
security measures along vulnerable paths are increased. 

Keywords-component; Mutli-path Security; Information Assurance; Anomaly Detection. 

5. Paper 31101338: On the Information Hiding Technique Using Least Significant Bits Steganography (pp. 
34-45) 

Samir El-Seoud, Faculty of Informatics and Computer Science, The British University in Egypt, Cairo, Egypt 
Islam Taj-Eddin, Faculty of Informatics and Computer Science, The British University in Egypt, Cairo, Egypt 

Abstract — Steganography is the art and science of hiding data or the practice of concealing a message, image, or 
fde within another message, image, or file. Steganography is often combined with cryptography so that even if the 
message is discovered it cannot be read. It is mainly used to maintain private data and/or secure confidential data 
from misused through unauthorized person. In contemporary terms, Steganography has evolved into a digital 
strategy of hiding a file in some form of multimedia, such as an image, an audio file or even a video file. This paper 
presents a simple Steganography method for encoding extra information in an image by making small modifications 
to its pixels. The proposed method focuses on one particular popular technique, Least Significant Bit (LSB) 
Embedding. The paper uses the (LSB) to embed a message into an image with 24-bit (i.e. 3 bytes) color pixels. The 
paper uses the (LSB) of every pixel's bytes. The paper show that using three bits from every pixel is robust and the 
amount of change in the image will be minimal and indiscernible to the human eye. For more protection to the 
message bits a Stego-Key has been used to permute the message bits before embedding it. A software tool that 
employ steganography to hide data inside of other files (encoding) as well as software to detect such hidden files 
(decoding) has been developed and presented. 

Key Words — Steganography, Hidden-Data, Embedding-Stego-Medium, Cover-Medium, Data, Stego-Key, Stego- 
Image, Least Significant Bit (LSB), 24-bit color pixel, Histogram Error (HE), Peak Signal Noise Ratio (PSNR), 
Mean Square Error (MSE). 



6. Paper 30091320: Color and Shape Content Based Image Classification using RBF Network and PSO 
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Abstract - The improvement of the accuracy of image query retrieval used image classification technique. Image 
classification is well known technique of supervised learning. The improved method of image classification 
increases the working efficiency of image query retrieval. For the improvements of classification technique we used 
RBF neural network function for better prediction of feature used in image retrieval.Colour content is represented by 
pixel values in image classification using radial base function(RBF) technique. This approach provides better result 
compare to SVM technique in image representation.Image is represented by matrix though RBF using pixel values 
of colour intensity of image. Firstly we using RGB colour model. In this colour model we use red, green and blue 
colour intensity values in matrix. SVM with partical swarm optimization for image classification is implemented in 
content of images which provide better Results based on the proposed approach are found encouraging in terms of 
color image classification accuracy. 

Keywords: RBF network, PSO technique, image classification. 

7. Paper 30091321: A Survey: Various Techniques of Image Compression (pp. 51-55) 

Gaurav Vijayvargiya, Dr. Rajeev Pandey, Dr. Sanjay Silakari 
UIT-RGPV, Bhopal 

Abstract — This paper addresses about various image compression techniques. On the basis of analyzing the various 
image compression techniques this paper presents a survey of existing research papers. In this paper we analyze 
different types of existing method of image compression. Compression of an image is significantly different then 
compression of binary raw data. To solve these use different types of techniques for image compression. Now there 
is question may be arise that how to image compress and which types of technique is used. For this purpose there are 
basically two types are method are introduced namely lossless and lossy image compression techniques. In present 
time some other techniques are added with basic method. In some area neural network genetic algorithms are used 
for image compression. 

Keywords-Image Compression; Lossless; Lossy; Redundancy; Benefits of Compression. 

8. Paper 30091325: Optimization of Real-Time Application Network Using RSVP (pp. 56-62) 

Vikas Gupta (1), Baldev Raj (2) 

(1) Assistant Professor, Adesh Institute of Engineering and Technology, Faridkot, Punjab, India 

(2) Research Scholar, Adesh Institute of Engineering and Technology, Faridkot, Punjab, India 

Abstract — In this research work Resource Reservation Protocol (RSVP) - which works on receiver - oriented 
approach is used. Two different networks have been designed and implemented using OPNET. In the first scenario 
the client are available with and without the use of RSVP. In this scenario, the parameters that have been selected, 
simulated and analyzed are reservation status message, reservation and path states in all value mode, traffic delay 
experienced in the form of end-to-end delay parameter with and without the use of RSVP, packet delay variation 
with and without RSVP. The analysis reveal that the attempted reservation status was successful, the number of 
reservation and path states were one, the end-to-end delay with the use of RSVP was comparatively lower than with 
the use of RSVP and also the packet delay variation for node with RSVP was lower than that of the node not using 
RSVP. In another scenario the network was duplicated but the link used for connecting the subnets was changed 
from DS1 (1.544 Mbps) to DS3 (44.736 Mbps). The parametric analysis indicated that end-to-end delay, Packet 
delay variation for the network with DS3 as the link, was lower than the network with DS1. 
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Dr. S. Udaya Kumar, Principal, MVSR Engineering college, Nadergul. 

G. Aparna, Associate Professor ECE, Aurora 's Engineering College, Bhongir 

Abstract - In today's world of information technology image encryption can be used for providing privacy and for 
protecting intellectual properties. During the transmission of images the threat of unauthorized access may increase 
significantly. Image encryption can be used to minimize these problems. In the proposed scheme of image 
encryption using poly substitution method we propose the possibility of taking the advantages of genetic algorithm 
features. In poly alphabetic substitution ciphers the plaintext letters are enciphered differently depending upon their 
placement in the text. As the name poly alphabetic suggests this is achieved by using several two, three keys and 
random keys combinations. 

Keywords: Image Encryption, Decryption, Genetic algorithm, poly substitution. 



10. Paper 31071319: Local Intrusion Detection by Bluff Probe Packet (LIDBPP) in A mobile Ad Hoc Network 
(MANET) (pp. 66-69) 

Imad I. Saada and Majdi Z. Rashad 

Department of Computer Science, Faculty of Computer and Information Sciences, Mansoura University, Egypt 

Abstract - Mobile ad hoc network (MANET) is a collection of wireless nodes that are distributed without 
dependency on any permanent infrastructure. MANET security has been studied in recent years For example the 
black hole threats which make the source believes that the path to the destination being through it. Researchers have 
proposed their secure routing idea in order to encounter these threats, the problem is that the security threats still 
exists because it is not prevented or avoided completely in addition, some of the solutions adversely affected 
network performance, such as adding additional network overhead and time delay. The main objectives of this paper 
is to discuss some recent solutions that work to detect a black hole node by using different strategies, one of these 
solutions is S-ZRP, it will be developed in this paper to generate a new proposed solution called local intrusion 
detection by bluff probe packet (LIDBPP), it will locally begin detection by the previous node and not by the source 
node as in S-ZRP, this will decrease the negative impact n the performance of MANET such as network overhead 
and time delay in AODV based MANET. 

Keywords: LIDBPP, MANET, Black hole, AODV, Network security. 
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G. T. Shakila Devi, Research Scholar, Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, 
India. 
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Abstract - There are many non Poisson queuing models. This paper mainly deals with the analysis of Non-Poisson 
queues (M/G/l): (GD/oo/ oo) and (Mi /Gi/1): (NPRP/oo/oo) .The feasibility of the system is analyzed based on the 
numerical calculations and Graphical representations. When the mean system size and the queue size is high, 
optimized value is obtained so that the total expected cost is minimized. The outline here an approach that may be 
used to analyze a non-Poisson model which has job classes of multiple priorities. The priority discipline followed 
may be either non-preemptive or preemptive in nature. When the priority discipline is non-preemptive in nature, a 
job in service is allowed to complete its service normally even if a job of higher priority enters the queue while its 
service is going on. In the preemptive case, the service to the ongoing job will be preempted by the new arrival of 
higher priority. If the priority discipline is preemptive resume, then service to the interrupted job, when it restarts, 



continues from the point at which the service was interrupted. For the preemptive non resume case, service already 
provided to the interrupted job is forgotten and its service is started again from the beginning. Note that there may be 
loss of work in the preemptive non-resume priority case. Such loss of work will not happen in the case of the other 
two priorities. Since the service times are assumed to be exponentially distributed, they will satisfy the memory-less 
property and that, therefore, the results will be the same both for the preemptive resume and preemptive non-resume 
cases. 

Keywords- Pollazek-Khintchine formula; Priority service discipline; Non-Poisson queues 
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Abstract — Data mining has various applications for customer relationship management. In this proposal, I am 
introducing a framework for identifying appropriate data mining techniques for various CRM activities. This 
Research attempts to integrate the data mining and CRM models and to propose a new model of Data mining for 
CRM. The new model specifies which types of data mining processes are suitable for which stages/processes of 
CRM. In order to develop an integrated model it is important to understand the existing Data mining and CRM 
models. Hence the article discusses some of the existing data mining and CRM models and finally proposes an 
integrated model of data mining for CRM. 

13. Paper 31101317: MAC Address as a Key for Data Encryption (pp. 83-87) 

Dr. Mohammed Abbas Fadhil Al-Husainy 
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Jordan, Amman, Jordan 

Abstract- In computer networking, the Media Access Control (MAC) address is a unique value associated with a 
network adapter. MAC addresses are also known as hardware addresses or physical addresses. TCP/IP and other 
mainstream networking architectures generally adopt the OSI model. MAC addresses function at the data link layer 
(layer 2 in the OSI model). They allow computers to uniquely identify themselves on a network at this relatively low 
level. In this paper, suggested data encryption technique is presented by using the MAC address as a key that is used 
to authenticate the receiver device like PC, mobile phone, laptop or any other devices that is connected to the 
network. This technique was tested on some data, visual and numerical measurements were used to check the 
strength and performance of the technique. The experiments showed that the suggested technique can be used easily 
to encrypt data that is transmitted through networks. 

Keywords: Crossover, Mutation, Information Security, Random, Distortion 
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Abstract- Retinal exudates classification and identification of diabetic retinopathy to diagnose the eyes using fundus 
images requires automation. This research work proposes retinal exudates classification. Representative features are 
obtained from the fundus images using segmentation method. Fuzzy logic and back propagation algorithm are 
trained to identify the presence of exudates in fundus image. The presence of exudates is identified more clearly 
using Fuzzy logic and back propagation algorithm. By knowing the outputs of proposed algorithm during testing, 



accurate diagnosis and prescription for treatment of the affected eyes can be done. Fifty fundus images are used for 
testing. The performance of proposed algorithm is 96% (48 images are classified). Simulation results show the 
effectiveness of proposed algorithm in retinopathy classification. Very large database can be created from the fundus 
images collected from the diabetic retinopathy patients that can be used for future work. 

Keywords: Diabetic retinopathy; fundus image; exudates detection; Fuzzy logic; back propagation algorithm. 
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Abstract — In this paper an integrated Certificateless Public 
Key Infrastructure (CLPKI) that focuses on key management 
issues is proposed. The proposed scheme provides two-factor 
private key authentication to protect the private key in case of 
device theft or compromise. The private key in the proposed 
scheme is not stored in the device, but rather it is calculated 
every time the user needs it. It depends also on a user's chosen 
password and then even if the device is stolen, the attacker 
cannot get the private key because he/she does not know the 
user's secret password. The proposed model provides many other 
key management features like private key recovery, private key 
portability and private key archiving. 

I. Introduction 

One main objective of the public key cryptography is to 
establish secure system that provides integrity, confidentiality, 
authentication and non-repudiation. Integrity and confiden- 
tiality are provided via symmetric crypto-systems such as 
the AES, and non-repudiation is provided through the digital 
signature. Any user within the system has public/private key 
pair, where the user's public key is used for symmetric key 
generation and signature verification. At this point, a key 
question is how the public key itself can be authenticated?, 
That is, who can insure that the particular public key belongs 
to the user that claims ownership. This problem is called 
the public key authentication and has remained a challenge 
for almost every secure system based on the public key 
cryptography technology. 

The Public Key Infrastructure (PKI) is a complete system 
to solve the above mentioned problem. It provides public key 
authentication by binding the entity information like subject 
name, email address and its public key in standard formatted 
document called digital certificate. X.509 [1] is the one of 
the widely used digital certificate standard that is supported 
by the International Telecommunication Union. This digital 
certificate is issued according to a set of procedures and 
policies and then signed by a trusted certificate authority's 
(CA) private key. Each user within the system can use his/her 
certificate to provide confidentiality through encryption or 
authentication and non-repudiation through digital signature. 

Within any PKI system, each certificate has a validity period 
after which it expires and consequently revoked. The PKI 



provides a mechanism to check the validity of the certificate 
by different methods. The most popular methods are the 
certificate revocation list(CRL) and the online certificate status 
protocol(OCSP). 

The X.509 specifies public key certificates, certificate re- 
vocation lists, attribute certificates, and a certification path 
validation algorithm. In the X.509 system, a certification 
authority(CA) issues a certificate binding a public key to a 
particular distinguished name in the X.500[2] tradition, or to 
an alternative name such as an e-mail address or a DNS-entry. 
An organization's trusted root certificates can be distributed to 
all employees so that they can use the company PKI system. 
Internet Browsers such as MS Internet Explorer, Firefox, 
Opera, Safari and Chrome come with a predetermined set 
of root certificates pre-installed, PKI certificates from larger 
vendors will work instantly, in effect the browsers' developers 
determine which CAs are trusted third parties for the browsers' 
users. 

Also, the X.509 includes standards for certificate revocation 
list (CPvL) implementations, an often neglected aspect of PKI 
systems. The lETF-approved way of checking a certificate's 
validity is the Online Certificate Status Protocol (OCSP). 
There are many security protocols based on the PKI like 
Secure Socket Layer(SSL), IPSec, S/MIME, VPN, and SSH 
protocols. 

Generally, the PKI suffers two problems, namely: scalabil- 
ity and certificate management[3]. The Identity-based Public 
Key Cryptography(ID-PKC) [4] came to address these two 
problems, but could not offer true non-repudiation due to the 
key escrow problem[3],[5].In ID-PKC, an entity's public key 
is derived directly from certain aspects of its identity, for 
example, an IP address belonging to a network host, or an e- 
mail address associated with a user. Private keys are generated 
for entities by a trusted third party called a private key gener- 
ator(PKG). The first fully practical and secure identity-based 
public key encryption scheme was presented in[6]. Since then, 
rapid development of ID-PKC has taken place. Currently, there 
exist Identity-based Key Exchange protocols (interactive[7] 
as well as non-interactive[8]), signature schemes [9], [10], 
[11], Hierarchical schemes[12] and a host of other primitives. 
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It has also been illustrated in [13], [14], [15] how ID-PKC 
can be used as a tool to enforce what might be termed 
"cryptographic work-flows", that is, sequences of operations 
(e.g. authentications) that need to be performed by an entity in 
order to achieve a certain goal[3]. ID-PKC also suffers from 
key escrow problem that the PKG knows all users's private 
keys in the system and furthermore can not offer true non- 
repudation. 

In 2003 Al-Riyami and Paterson [3] introduced the concept 
of Certificateless Public Key Cryptography (CL-PKC) to over- 
come the key escrow limitation of the identity-based public 
key cryptography (ID-PKC). In CL-PKC a trusted third party 
called Key Generation Center (KGC) supplies a user with a 
partial private key. Then, the user combines the partial private 
key with a secret value (that is unknown to the KGC) to obtain 
his full private key. In this way the KGC does not know the 
user's private key. Then the user combines his secret value 
with the KGC's public parameters to compute his public key. 

The certificateless cryptography is considered a combina- 
tion between PKI and identity based cryptography [3]. It 
combines the best features of the PKI and ID-PKC, such 
as lack of certificates, no key escrow property, reasonable 
trust to trust authority and lightweight infrastructure [16]. It 
provides a solution to the non-repudiation problem, through 
enabling a user to generate his/her full long-term private key, 
where the trusted third party is unable to impersonate the 
user. The use of certificateless cryptography schemes have 
appeared in literature, this includes the uses of certificateless 
encryption [5], [17]; certificateless signatures [18], [19] and 
[20] and certificateless signcryption[21],[22] and [23]. 

Almost all the CLPKC schemes found in the literature focus 
on algorithms of public parameters generation, public/private 
key generation of system's parties, encryption and decryption 
processes, but leaves many key problems without clear so- 
lutions. Such problems like how the system parameters are 
published and where, what the authentication method that can 
be used between the users and the KGC server, what the users 
shall do if the KGC updates it's parameters and how they can 
be notified, what is the format of the elements of the CLPKC 
system, and so forth. 

Also there are other challenges regarding trust models, such 
as to determining whether the traditional PKI trust models 
can be applied to CL-PKI, whether a PKI can be migrated 
to CLPKI, and whether an existing PKI-based system can be 
integrated with another CLPKI-based system. 

In this paper, an integrated model of Certificateless Public 
Key Infrastructure(CLPKI) is proposed. It is assumed that 
there exists a Registration Authority(RA) which is responsible 
for user's registration in the system, and a Key Generation 
Center(KGC) that is used to generate the system parameters 
and master secret and publish the system parameters on the 
public directory(PD) and keep the master secret secure. The 
proposed model provides strong key management mechanism 
by separating the generation of public key from private key, 
this separation if it is controlled well provides private key 
protection from device theft or key compromise because the 



private key is never stored in the user, but rather just the 
hashed value of the user's secret key along with the user's 
partial private key, this separation also provides private key 
recovery, private key archiving and private key portability, also 
the proposed model provides silent and transparent private key 
revocation in case of public key compromised or expired. 

The rest of this paper is organized as follows. Section II 
gives backgrounds about elliptic curve and pairing. In Section 
III, we introduce the concept of certificateless public key 
cryptography. In Section IV, we introduce the proposed certifi- 
cateless public key infrastructure model. Section V discusses 
security properties provided by the proposed model. Finally, 
Section VI concludes the paper. 

II. Backgrounds 

In this section we give some backgrounds about pairing in 
elliptic curves, pairing-based cryptography, certificateless pub- 
lic key cryptography, password-based encryption, challenge- 
response authentication method and certificateless key agree- 
ment protocols. 

A. Pairings in Elliptic Curve 

Throughout the paper, G\ denotes an additive group of 
prime order q and G 2 a multiplicative group of the same order. 
We let P denote a generator of G\. For us, a pairing is a map 
e : Gl x Gl — > G2 with the following properties: 

1) The map e is bilinear: given Q,W,Z e G\, we have: 
e(Q, W + Z) = e(Q, W) ■ e(Q, Z) and e(Q + W,Z) = 
e{Q,Z) ■ e{W,Z). 

Consequently, for any a, 6 e Z q , we have 
e(aQ,bW) = e(Q,W) ah = e(abQ,W) etc. 

2) The map e is non-degenerate: e(P,P) ^ Iq 2 . 

3) The map e is efficiently computable. 

Typically, the map e will be derived from either the Weil or 
Tate pairing on an elliptic curve over a finite field. We refer to 
[24], [25], [6], [26], [27], [28], [29], [30] for a more compre- 
hensive description of how these groups, pairings and other 
parameters should be selected in practice for efficiency and 
security. We also introduce here the computational problems 
that will form the basis of security for our CL-PKC schemes. 

B. Bilinear Diffle-Hellman Problem(BDHP): 

Let G 1 ,G 2 ,P and e be as above. The BDHP in d, G 2 , e 
is as follows: Given P, aP, bP, cP with uniformly random 
choices of a,b,c e Z*, compute e(P,P) abc E G 2 . An al- 
gorithm A has advantage e in solving the BDHP in G\,G 2l e 
if: 

Pr[A(P, aP, bP, cP) = e(P, P) abc ] = e. Here the probability 
is measured over the random choices of a,b,c e Z* and the 
random bits of A. 

C. BDH Parameter Generator: 

As in [6], a randomized algorithm IG is a BDH parameter 
generator if IG: 

1) takes security parameter k > 1, 

2) runs in polynomial time in k, and 
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3) outputs the description of groups G\, G 2 of prime order 
q and a pairing e : Gl x Gl — > G2. 

Formally, the output of the algorithm IG(l k ) is (Gi, G2, e). 
There are other computational hardness assumptions related to 
pairings and are infeasible in polynomial time [6], [27]. 

1) Elliptic Curve Discrete Logarithm Problem: Given 
P, Q e Gi, find an element a e Z* such that Q = aP. 

2) Computation Elliptic Curve Diffie-Hellman Problem: 

Given (P, aP, bP) in G\ where a, b e Z*, compute abP. 

III. Certificateless Public Key Cryptography 
(CL-PKC) 

In 2003 Al-Riyami and Paterson [3] introduced the con- 
cept of Certificateless Public Key Cryptography (CL-PKC) 
to overcome the key escrow limitation of the Identity-based 
Cryptography. In CL-PKC a trusted third party called Key 
Generation Center (KGC) supplies a user with partial private 
key, the user then combine the partial private key with a secret 
value (unknown to the KGC) to obtain his/her full private key. 
In this way the KGC does not know users private keys. Then 
the user combines the same secret value with the KGC's public 
parameters to compute his/her public key. 

Compared to Identity-based Public Key Cryptography (ID- 
PKC), the trust assumptions made of the trusted third party 
in CL-PKC are much reduced. In ID-PKC, users must trust 
the private key generator (PKG) not to abuse its knowledge of 
private keys in performing passive attacks, while in CL-PKC, 
users need only trust the KGC not to actively propagate false 
public keys [3]. 

In CL-PKC users can generate more than one pair of 
key (private and public) for the same partial private key. To 
guarantee that KGC does not replace user's public keys Al- 
Riyami and Paterson[3] introduced a binding technique to bind 
a user's public key with his/her private key. In their binding 
scheme, the user first fixes his/her secret value and his/her 
public key and supplies the KGC his/her public key. Then the 
KGC redefine the identity of the user to be the user's identity 
concatenated with his/her public key. By this binding scheme 
the KGC replacement of a public key apparent, and equivalent 
to a CA forging a certificate in a traditional PKI. 

A. Al-Riyami and Paterson Scheme 

In this section we give a general description to Setup, 
Set-Secret- Value, Partial-Private-Key-Extract, Set-Private-Key 
and Set-Public -Key algorithms as introduced by Alriyami and 
Paterson [3]. 

Let fcbea security parameter given to the Setup algorithm 
and IQ be a Bilinear Diffie-Hellman Problem (BDH) param- 
eter generator with input k. 

1) Setup (running by the KGC): this algorithm runs as 
follows: 

a) Run XQ on input k to generate output < 
Gi, G 2 , e > where Gi and G 2 are groups of some 
order q and e : G\ x G\ — > G2 is a pairing. 

b) Choose an arbitrary generator P e G\. 



c) Select a master-key s uniformly at random from 
Z* and set P = sP. 

d) Choose cryptographic hash functions 

Hi : {0, 1}* — ► G* 

and 

H 2 : G 2 — > {0, 1}" 

where n is the bit-length of plaintexts taken from 
some message space M = {0, 1}" with a corre- 
sponding ciphertext space C = G\ x {0, 1}™. 
Then, the KGC publishes the system parameters 
params =< G\, G2, e, n, P, Pn, Hi, H 2 >, while the 
secret master-key s is saved secure by the KGC. 

2) Set-Secret- Value (running by the user): The inputs of 
this algorithm are params and entity m 's identifier ID m . 
It selects x m <E Z* at random and output x m as m's 
secret value. Then, the entity m computes X m = x m P 
and sends X m to the KGC. 

3) Partial-Private-Key-Extract (running by the KGC): 
The inputs of this algorithm are an identifier ID m e 
{0, 1}* and X m . The algorithm carries out the following 
steps to construct the partial private key for entity m 
with identifier ID m . 

. Compute Q m = Hi(ID m \\X m ). 

* Output the partial private key D m = sQ m e G*. 

Entity m when armed with it's partial private key D m , 
it can verify the correctness of the partial private key 
D m by checking e(D m ,P) = e(Q m , P ). 

4) Set-Private-Key (running by the user): The inputs of 
this algorithm are params, D rn (the partial private key 
of entity m) and x m G Z* (the secret value of entity m. 
It transforms the partial private key D m to a private key 
S m by computing S m = x m D m = x m sQ m € G\. 

5) Set-Public-Key (running by the user): The inputs of 
this algorithm are params and x m € Z* -which is the 
secrete value of entity m. It then constructs the public 
key of identity m as P m =< X m ,Y m >, where X m = 
x m P and Y rn = x m P = x m sP. 

The purpose of binding technique used in Al-Riyami and 
Paterson[3] scheme is to enforce users to have one pub- 
lic/private key pairs in the system, and if there are two working 
public keys of any user, then the other key was generated by 
the KGC and this is equivalent to CA certificate forgery in 
traditional PKI. There are some modified schemes appeared 
in the literature from the original Al-Ryami and Paterson 
schme[3], for example Mokhtarnameh et al[31] proposed 
little modification on original scheme by setting the user's 
public key Pa = xaQa and used the new public key in 
his proposed two party key agreement protocol in the same 
paper, Yang et al[16] showed that the two party key agree- 
ment protocol that proposed by Mokhtarnameh et al[31] is 
attackable by the man-in-the-middle attack and also explained 
that the Mokhtarnameh et al[31] did not provide one-to-one 
correspondence between the user's identity and user' public 
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key as they claimed, Mohammed et al [32] explained that 
Mokhtarnameh[31] and Yang et al[16] schemes suffer from 
key escrow problem by that the KGC can computer the user's 
private key Sa = sY A because the public key components 

Y A = x A Q A . 

B. The Mohammed et. al's authenticated key agreement pro- 
tocol 

In [32], Mohammed et. al proposed a new modified cer- 
tificateless public key cryptography scheme and binding tech- 
nique and used their new scheme to create an authenticated 
two party key agreement protocol without interaction between 
parties, in this section we will mention the algorithms of 
Mohammed et al[32] scheme. 

1) Setup (running by the KGC): the KGC chooses 
a secret parameter k to generate G\,G 2 ,P,e where 
Giand G 2 are two groups of a prime order q, P is 
a generator of G\ and e : G\ x G\ — > G2 is a 
bilinear map. The KGC randomly generates the system's 
master key s <G Z* and computes the system public 
key Pp U b — sP. Then the KGC chooses cryptographic 
hash functions Hi and H 2 , where Hi : {0, 1}* x 
Gi — > Gi and H 2 : G t x d x G x x G 2 — > 
{0, 1}™. Finally, the KGC publishes the system param- 
eters params—< G\, G 2 , e, P, P pu b, H\,H 2 ,n >, while 
the secret master-key is saved and secured by the KGC. 

2) Set-Secret-Value (running by the user): the user m 
with the identity ID m downloads the system's public 
parameters from the KGC. Then, he/she picks a two 
random secret values x m ,x' m G Z*. Then, the user m 
computes X m = x m P and sends X m to the KGC. 

3) Partial-Private-Key-Extract (running by the KGC): 
on receiving X m computed by user m with identity 
ID m , the KGC first computes Q m = Hi(ID m \\X m ), 
then it generates the partial private key of user m as 
D m — sQ m - User m when armed with it's partial private 
key D m , he/she can verify the correctness of the partial 
private key D m by checking e(D m ,P) = e(Q m ,P ). 

4) Set-Private-Key (running by the user): when user m 
receives D m from the KGC, he/she computes his/her 
full private key S m = x m D m . 

5) Set-Public-Key (running by the user): the user m with 
identity ID m computes Q m = Hi{ID m \\X m ), Y m = 
Xm x ' m Qm an d sets < X m ,Y m > as his/her long-term 
public key P m . Finally, user m sends Y m to the KGC. 

The purpose of the secret value x' m is to prevent the key 
escrow problem that can be performed by the KGC. 

In our modified scheme we make slight modification on 
Mohammed et al[32] scheme in terms of efficiency without 
changing the level of the security the scheme has. 

IV. The Proposed Certiflcateless Public Key 
Infrastructure Model 

The motivation of this paper is to propose a new CLPKI 
model, the proposed model is based on modified version of 



Mohammed et al[32] scheme that was introduced previously 
in section 3.2. 

A. The Modified Certiflcateless Cryptography Scheme 

In this scheme Setup and Partial-Private-Key-Extract algo- 
rithms are the same as Mohammed et al[32] scheme. 

• Set-Secret- Value (running by the user): the user m with 
the identity ID m downloads the system parameters, picks 
a two random secret values x m , x' m e Z*. Then, the user 
m computes X m = x' m P and sends X m to the KGC. 
To provide two factor authentication and protecting the 
user's private key in case of device theft or compromise, 
the proposed scheme then enforce the user to choose a 
strong password pass, the system at client hashes the 
password to be z m = H(pass), multiplies the base point 
P by the hashed password to be z m P(using special hash 
function to reserve the large size of the hashed value z m 
to prevent brute-force attack on the point z m P and by 
that get the user's hashed password), use the hashed value 
z m as key along with the MAC function to encrypt the 
secret value x m as MAC Zm (x m ), sends copy of it to the 
KGC's public directory and store copy of it along with 
the point z m P locally. Note that here there is no need to 
store the password pass or it's hash value z m . 

• Set-Public-Key (running by the user): the user m with 
identity ID m computes Q m = Hi(ID m \\X m ), Y m = 
x' m Q m and sets < X m , Y m > as his/her long-term public 
key P m . Finally, user m sends Y m to the KGC. 

. Set-Private-Key (running by the user): every time the 
user needs to calculate and use his/her full private key, 
he/she enters his/her password, the system hashes it as 
z' m , calculates z' m P and comparing it with stored z m P, 
if it is equals then the password is correct and the user 
is authentic, use it(z m ) as key to decrypt the stored 
MAC Zm (x m ), and after that use the extracted x m to 
calculate the full private key by (x m + z m )D m , otherwise 
the system aborts the process. We must note here that the 
private key is never stored on the client and it will be 
deleted after every usage. 
Instead of multiplying both X m and Y m by x m and x' m , they 
should be multiplied by x' m only. By this way, the number 
of field multiplications is decreased. The proposed scheme 
assumes that the user uses his/her password every time he/she 
needs to use his/her full private key, calculates the private 
key as previous and use it and after that delete it, the private 
key is never stored on the device storage, this separation 
of calculating the public and private keys if it is controlled 
well it will be very usefully feature in public key revocation 
when the private key is compromised or stolen, other features 
provided by this separation are private key recovery, private 
key portability and private key archiving. Furthermore, the 
proposed scheme provides two authentication factor, sine the 
authenticated user need to have the device that store the 
secret number x m as first factor and after that authenticate 
himself/herself to the device by correct password, because the 
hashed value of the user's password is involved in private key 
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TABLE I 

Record 1: contains the KGC's system public parameters with 
the timestamp of update 

Timestamp | System Parameters 

calculation, even if the attacker somehow get the user's device, 
he can't calculate the private key because he/she does not know 
the user's password. 

We will use this modified scheme in our model to provide 
strong key management mechanisms based on certificateless 
cryptography. 

Regarding our scheme, user A can check the authentica- 
tion of KGC system parameters by validating the equality 
e(D A ,P) = e(sQ Al P) = e{Q A ,sP) = e(Q A ,P ) and 
user B can check the authentication of user A's public key 
Pa = (X A , Ya) also by validating the equality e(X A , Qa) = 
e(x' A P,Q A ) = e(P,x' A Q A ) = e(P,Y A ). Validating the 
user's partial private key that downloaded from the KGC and 
public key by using pairing in certificateless cryptography 
is equivalent to verifying the CA's signature on the user's 
certificate in traditional PKI. 

B. Components of the proposed CLPKI 

In this section we describe the components of the proposed 
CLPKI and their functions as follows: 

1) The Registration Authority (RA): It's function like in 
traditional PKI, the user may interact with this authority 
and fill in registration form, provides with personal 
information like names, address, national ID number and 
email address, after the RA authenticate the information 
of the user, it gives the user a unique random generated 
password for latter authentication purposes, in some 
cases the RA may give the user the system parameters 
which generated by the KGC server in a token or any 
electronic media. 

2) The Key Generation Center(KGC): responsible for 
generating its master secret and the system parameters, 
keep it's master secret in a secure storage and publish the 
system parameters in a public directory. KGC also has 
database that holds the user identities with their pass- 
word hashed by any strong cryptographic hash function 
like MD5 or SHA-1. 

3) The KGC's Public Directory (PD): it is a public 
directory system that consists of the KGC's public 
parameters, users identities, users partial private keys, 
users public key and other user parameters. It should be 
well controlled, updated only by the KGC, read only and 
accessible just by authenticated users. The typical format 
of the public directory records are given in Table I and 
Table II. 

Typically the RA has offline connection with KGC, and 
the password that given to the user by RA is originally 
generated and stored with user's Id by the KGC (the RA 
should just bypass it to the user without knowing it). 
To register a user in the system, the registration process 
starts from the RA, where the user introduces his/her iden- 



TABLE II 

Record 2: contains the user's identity information along with 
user's public key and partial private key, timestamp indicate 
the update time 

Timestamp | UserlD | Q ID | X ID | Y ID 



tity proof together with the other needed information to the 
registration authority. The RA then gives the user a password 
and the system's parameters, another scenario is that the user 
downloads the KGC's system parameters from the public 
directory. Both the user and the server need to authenticate 
the identities of each other. Therefore, it is necessary to setup 
a robust authentication technique for both the user and the 
server. 

C. Authentication at first time access 

In this part, we introduce two techniques for the client 
to authenticate the server. The first technique is based on 
server response to a client challenge to authenticate the server, 
whereas the second technique is based on authentication of the 
server via a digital certificate. Bellow we detail each. 

1) The challenge-response method:: The challenge- 
response method works as follows. 

1) The user hashes his password and uses his/her hash value 
as symmetric encryption key with any agreed symmetric 
cryptosystem with the KGC server like AES. 

2) The user generates a random nounce m, encrypts it us- 
ing the symmetric key and sends the encrypted message 
to the KGC. 

3) On receiving the encrypted message from the user, 
the KGC decrypts the message using the stored user's 
hashed password as decryption key to extract the nounce 
n\. Then, the KGC concatenates the nounce n\ with the 
URL of the download link url of the record in the public 
directory. Finally, it encrypts the message n'^url using 
the same user's key and sends the encrypted message 
back to the user. 

4) The user decrypts the encrypted message to extract the 
nounce n' x and the URL url. The user then verifies the 
validity of the nounce, if n' x is equal to rii then he/she 
uses the url link to download the public parameters from 
the public directory (not necessarily in secure channel). 

If the KGC server needs to authenticate the user (in case 
of partial private key downloading), this mechanism can be 
extended to be: 

1) The user hashes his/her password and uses this hash 
value as symmetric encryption key with any agreed 
symmetric cryptosystem with the KGC server like AES. 

2) The user generates a random nounce m, encrypts it 
using key and sends the encrypted message to the KGC. 

3) The KGC receives the encrypted message from the 
user. The KGC decrypts the message using the stored 
user's hashed password as decryption key to extract the 
nounce n' 1 . Then, the KGC generates another nounce 
n 2 , concatenates the nounce n 2 with the nounce n[. It 
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encrypts the message n'JI^ using the same user's key 
and sends the encrypted message back to the user. 

4) The user decrypts the encrypted message to extract the 
nounce v! x and nounce n' 2 . The user then verifies the 
validity of the nounce, if v! x equals to m, he/she authen- 
ticates the KGC, the user encrypts the nounce n' 2 and 
sends it back again to the KGC for user authentication. 

5) The KGC decrypts the message n' 2 ' and compares it with 
it's value n 2 , if they are equal, the server trusts the user, 
and finally sends him/her the download link url of the 
requested information (partial private key as example) 
to user encrypted with the user's key. 

6) The user decrypts the URL and uses it to download the 
requested information from the public directory. 

Note that by using this method, the attacker cannot carry 
out a replay attack, because every time the user generates a 
new nounce. Also, the attacker cannot carry out a password 
dictionary attack because the password is hashed and used as 
encryption/decryption key. 

2) The certificate-based method:: The certificate-based 
method is another method to authenticate the KGC server for 
the users. It is a hybrid model that incorporates the traditional 
PKI and the model originally used by Chen et al in [13]. In 
this model the trusted Certificate authority (CA) generates and 
signs a digital certificate (typically X.509 format) for the KGC 
server that contains the KGC identity information (in Subject 
field) with it's public parameters(in the Public Key field), then 
the users authenticate the KGC by using this certificate as 
follows: 

1) The user sends a request to the KGC server. 

2) The KGC sends its certificate back to the user. 

3) The user checks the validity of the certificate, if it is 
valid then the user authenticates the KGC and extracts 
the public parameters from the certificate. Otherwise, the 
user rejects the certificate and abort the process. 

We note that this hybrid model utilizes the PKI at the KGC 
levels and the CLPKI at users level, and this model can be 
extended to be used also in the hierarchal CLPKI model, 
in which the root KGC and each intermediate KGC can 
have a certificate. Hence, the user can authenticate the public 
parameters and the users in other domains using the chain path 
as in traditional hierarchal PKI. 

D. Periodic update of passwords and system's parameters 

The KGC must have a policy for the periodic updates of 
the users passwords to avoid any password learning attacks. 
One possible method is by determining a fixed period, when 
reached the KGC generates a new password for a particular 
user, encrypts it using the user's old password and sends it to 
the user, the user decrypts it to obtain the new one. 

To enhance the security of the system, the KGC uses the 
Timestamp field in the public parameters record to indicate 
any update in its system parameters. When the KGC submits 
new parameters, it updates the Timestamp to the new one. 
The user must periodically check the Timestamp field in the 
public directory and compare it to the previously downloaded 



one, if they do not match, the user downloads the new public 
parameters and updates his key pairs accordingly and publishes 
them to the public directory. 

Another way to apply the policy of periodic system parame- 
ters update, the KGC can instead of submitting the Timestamp 
of the parameters generation, it can submit the lifetime of this 
parameters, therefore the application that installed on the user 
device must check the expiry date of the parameters. When 
those parameters expire, each user shall get access to the public 
directory and downloads the new system's parameters. Also, 
the user shall complete the procedure of changing it's key pairs 
as described above. 

E. Generation of public-private key pairs 

When the user completes his/her registration process and 
gets a copy of the KGC's system parameters, he/she can 
proceed to generate his/her public/private key pairs as follows: 
1) First, the user runs the Set-Secret- Value algorithms to 
generate his/her two secret numbers xjrj,x' ID . The 
worthy notice here is that there shall be a method to 
control the generation of the secret numbers x' ID . This 
due to the fact that even if Xm is same for many users, 
this will not result in collisions of the private keys of 
those users, for the private key of each user depends 
on his/her identity which is unique in the system. But 
in the case of x' ID , in large-scale applications very 
large number of users, two users might use the same 
pseudo-random-generator (may be with different seeds) 
and generate the same secret number, leading to users's 
public key collusion which is not acceptable. 
The collusion may appear because all users generate 
their secret numbers independently (i.e the generation 
algorithms are executed on the users devices not cen- 
trally on the server), and may be for some t users(t >= 
2) by accident have the same value of seed then it is 
strongly possible in that case to generate the same secret 
value, we have three proposed solutions to this problem. 
Below we discuss each. 

a) The KGC adds extra parameter to its public param- 
eters, this parameter may be the pseudo-random- 
generator(PRG), so that the KGC divides the users 
domains to sub-domains and submit different PRG 
for each sub-domain, this can guarantee that each 
user generates unique secret va\ue(x' ID ) even if he 
has same seed with other user(s). 

Also this extra public parameter may be the seed 
itself and by dividing the users domain to sub- 
domains this also can guarantee the uniqueness of 
the secret value. 

b) The KGC adds new field to the user's record in the 
public directory, this field holds the hash value of 
the user's secret value H(x' ID ) generated by the 
users, when the new user generates his/her secret 
value, he/she must calculate its hash value and 
compare it to the hash values existing in the public 
directory, if any such match exists then he/she 
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generates new secret value until no matching exits, 
after he/she make sure that his/her secret value 
is unique, he/she must adds his/her hash value to 
his/her record of the directory, and by this way the 
directory in typical situation is growing up, so the 
numbers of the records in the directory be equal to 
the number of the users in the system, although this 
directory is public, but the hash values attacks like 
dictionary attack are infeasible because the secret 
numbers are very large, 
c) Making the generation of the user's seed depends 
on user's identity ID so that we can develop a 
special hash function Hid{ID\\S) that accept the 
user's identity with some secure random param- 
eters S and return the seed that used latter to 
generate the user's secret value, we let this hash 
function generate the seed to allow the same user 
to use this seed to generate multiple secret values 
if the infrastructure policy support this scenario. 
This method is more efficient than the first two 
methods because it does not need any KGC inter- 
vention through system parameters neither search 
operation before confirmation of the secret value. 
The security of the secret value in this method 
depends totally on the way of choosing the secure 
random parameters S, therefore we need to control 
the parameters S and make it secure on each user, 
and also to protect this secret value we need at this 
time not to support any oracle to this hash function 
because this will trivially exposes the user's secret 
value to the attacker and impact the security of the 
whole protocol. 

2) The user generates his/her public key Xjd,Yid and 
publishes them to the KGC server with his/her identity 
ID, note that Qid — H(ID\\Xjd) and this guarantees 
the one-to-one correspondence between the user's iden- 
tity and his/her public/private key but not one-to-one 
correspondence between the user's public and private 
key. 

3) The KGC accepts the user ID,Qm, Xid,Yjd, calcu- 
lates the user's partial private key Did and publishes 
this record with Timestamp T into the public directory 
as {ID,Q ID ,X ID ,Y ID ,H(x' ID ),MAC(x ID ),T). 

4) The user authenticates the public directory to download 
his/her partial private key by one of three ways, either by 
using his/her password and challenge-response method 
described previously, or by checking the equality of 
the pairing operation e(D ID ,P) = e(Q ID , P ), or by 
verifying the certificate of the KGC, also the KGC can 
authenticate the user by using the extended Challenge- 
Response method described previously. 

5) The user's private key Sid is never stored in the device 
and it is calculated every time the user needs it. 

By adding the MAC(xid) to the user's record in the public 
directory, our CLPKI model provides three important features, 



Private-key Recovery, Private-key Portability and Private-key 
Archiving. This can be viewed as follows. 

1) Private-key Recovery: 

Private-key recovery is provided in case of file system 
corruption or key theft. Sine the value(M AC (x ] £>)) 
exists in the directory, the user can decrypt this MAC 
value by his/her key to extract his/her secret values and 
use them to calculate his/her private key again. 

2) Private-key Portability: 

Also this mechanism provide infrastructure portability 
because sine the user's secret value is stored in MAC 
publicly, then the user can calculate his/her private key 
from any where if he/she knows the MAC's secret key, 
we assume here that the MAC's secret key is derived 
from password chosen by the user. 

3) Private-key Archiving: 

The KGC can make a backup of the user's record 
(we mean here by user record his/her public key along 
with the MAC of his/her secret value x ID ) before any 
public/private key change request, therefore the KGC 
has multiple backups of user's past public/MAC values 
with beginning dates of the use durations. On the other 
hand, a user stores his/her passwords that the MAC keys 
depend on at different times in a secure media. There- 
fore, when the user needs to return to any encrypted 
message in the past (for example in case of secure end- 
to-end mail system) he/she must retrieve the sets of keys 
used for the message encryption. He/She must first prove 
his/her identity to the KGC, and then requests the KGC 
to enable him/her to retrieve his/her public/MAC pair in 
the specific period of time, decrypts the MAC by the 
stored password and extracts the secret value xjd and 
uses it to calculate his private key at that time and finally 
uses this private key for the message decryption. Also 
this backups records in the KGC allow the other users 
to download a particular user's public key in a particular 
time for signature verification purposes. 

F. Public Key Revocation 

One of the biggest challenges in the traditional PKI is pubic 
key revocation, key revocation problem includes periodic key 
renewal, key suspension and key revocation. By periodic key 
renewal we mean that the system policy should enforce key 
change every specific period of time say after one weak, one 
month and even one year depending on the application itself. 
The KGC must send a renewal request to each user, then the 
user generates new public/private key pairs and publishes them 
to the public directory. 

The key suspension is temporarily key revocation that might 
happens when the user temporarily be out of the system. For 
example when the employee takes his/her annual vacation, 
his/her key must be revoked temporary(suspended) until he/she 
returns back to the work and the system shall restore his/her 
keys automatically without changing them. 

The third issue is the permanent key revocation or simply 
key revocation. This might happen when the employee quits 
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from the enterprise, and by then, the KGC must indicate 
his/her key revocation. 

1 ) The notification system for public key revocation: In all 
of the above cases, the biggest problem that encounters the 
proposed system, is the notification method, i.e the mean by 
which the system can notify the other users that a pubic key 
of some user has been renewed, suspended or revoked. The 
traditional PKI has two known methods, Certificate Revocation 
List(CRL) and the Online Certificate Status Protocol(OCSP). 
In the first method the system publishes a list of revoked 
certificates called the CRL that contains the serial numbers 
of all the certificates that has been revoked. A user must 
download the CRL every time he/she needs to check a specific 
certificate. The CRL method is practical in some small-scale 
applications, but when the number of the users grows this 
method becomes impractical and inefficient, because of the 
overhead it puts at the user's side and also because the 
download requests of the CRL is not bandwidth efficient 
method. A third reason is that the system becomes vulnerable 
to Denial of Service(DoS) attack. 

The other method uses the OCSP protocol which simply is 
a web service, the user in this case does not need to download 
the complete CRL to check the specific certificate. Instead, 
he/she only sends the serial number of the certificate that 
he/she needs to check to the OCSP server, then the OCSP 
sends the certificate status to the user, which is signed by the 
OCSP private key. The user verifies the signature of the OCSP 
and checks the OCSP response. The response of the OCSP 
is logical value, true if the certificate is revoked and false 
otherwise. This method is more efficient than the former one, 
but also vulnerable to DoS attacks. Moreover, the cost of the 
signature sign/verify must be considered when the system runs 
in a large-scale application. Some recent enhancement to test- 
ing the revocation status of the certificate, has been introduced 
through using the Micalli's NOVOMODO method[33] which 
replaced the signature by a hashing operation, and therefore, 
increases the efficiency of the OCSP protocol. 

The public key revocation must include the following sce- 
narios: 

1) Applying the periodic renewal of the public key deter- 
mined by the system's policy. 

2) A user shall request the change his/her public key when 
its private key is stolen or compromised. 

3) The system can suspend a particular user's pubic key 
for a while. 

4) The shared problem in all previous scenarios is the 
notification mechanism, i.e how the system can notify 
the other users in the system about the status of the given 
user's public key in an efficient way. 

The proposed model provides solutions to the key revocation 
and its notification problem, we can summarize these proposed 
solutions as follows: 

1) We can add a new field to each user's record in the public 
directory called status field that indicates the status of 
the user's public key in a given time. The possible 



values of the status field can be (VALID, EXPIRED, 
SUSPENDED, REVOKED). The KGC is responsible 
of updating its users' status fields according to the 
current information, if the user's public key is expired, 
the system will automatically changes his/her status 
to EXPIRED, the user needs to send query about the 
specific public key every time he/she needs to use it, 
the KGC responses by the corresponding status word as 
explained. 

2) The other proposed solution when we use the MAC 
of the secret value that stored in the public directory, 
then when a given user needs to change his/her private 
key(in case of device theft or public key compromised), 
basically he/she does not need to do any thing, because 
the proposed model provides automatic key recovery by 
its nature, this because the stolen device does not store 
the user's private key, just the MAC with the point 
z m P, so the attacker to get the user's password(z m ) 
needs either to cryptoanalysis the MAC function or 
solve the Elliptic Curve Discrete Logarithm Problem 
which are assumed hard. The user also can make extra 
secure step to further protecting his/her private key and 
make the stolen device be useless by changing his/her 
password that latter converted to MAC key, calculating 
new MAC using the new key and publishes it into the 
public directory. Therefore, the private key that was 
stolen becomes useless, because in the next trial of using 
the old private key, the client system will fail to decrypt 
the MAC after downloading it from the public directory. 
This will drop out the risk of using the stolen private key 
without needing to change the published and distributed 
public key and also without the need to notify other users 
about this change procedure. Hence, the key revocation 
is done in fully transparency from the other users, note 
that this method required the client system to download 
the MAC from the public directory every time. 

V. Security Analysis 

In this section we mention the security services that our 
infrastructure support: 

1) Confidentiality: we use any symmetric key cryptosys- 
tem to provide confidentiality through message encryp- 
tion/decryption, the session key that is used for encryp- 
tion/decryption is agreed on previously between the two 
communicating parties before the session starts, the in- 
frastructure allow the sender and receiver to authenticate 
each other using the pairing operation before the key 
agreement protocol starts. 

2) Integrity: we use either hash function or MAC to 
authenticate the messages in each session, in case of 
using the MAC function, the two communicating parties 
run the key agreement protocol to generate another 
secret key other than the one that used for encryp- 
tion/decryption. 

3) Authentication: here we mean entity authentication, this 
is already provided in our model, sine the key agreement 
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protocol authenticate the parties first then generate the 
secret key, so this insure that the only authenticated 
parties can compute the secret sharing key. 

4) Non-Repudiation: by adding the signature on each 
message, the party can not deny that he/she was send the 
message and so the non-repudiation becomes exists, we 
can use any elliptic curve signature scheme like ECDSA. 

5) Two-factor authentication since the private key de- 
pends on the user chosen password, then authentic user 
need to have the device that has the MAC of the secret 
key stored as first authentication factor, and also needs 
to enter the correct password to decrypt the MAC and 
calculate the full private key as the second factor. 

6) Private Key Protection since the private key is never 
stored on the user's device, then the proposed scheme 
protect it in case of device theft, because the stolen 
device be useless in other hand than the authenticated 
user hand. 

VI. Conclusions and Remarks 

This paper discussed the weaknesses of the existing public 
key cryptography infrastructures, particularly the PKI and 
Identity-based Cryptography(IBC). It also addressed the pro- 
cedural issues that are related to the Certificateless Cryptog- 
raphy. A new model for Cerificateless Public Key Infrastruc- 
ture(CPKI) is proposed, where the public key and private key 
are independent. This means that the public key is generated 
from secret number other than the one that is used to calculate 
the private key, this separation adds many features to the 
CLPKI schemes, these feature are private key protection in 
case the device is stolen or compromised, transparent private 
key revocation, private key recovery, private key portability 
and private key archiving. The proposed model also provides 
two-factor authentication to access the private information and 
calculate the private key, the private key is never stored in 
the device and it is calculated every time the user needs 
it, the user needs a strong password to recover the private 
key and use it with other parameters to calculate the per- 
session symmetric key or signing the message to provide non- 
repudiation. The proposed model also addressed the public 
key revocation problem, if the private key is compromised, the 
user does not need to change his/her public key, he/she just 
needs to change his/her password and calculate and publish 
new MAC into the public directory, this will eliminates a lot 
of work regarding calculating new public key, publishing it 
and notifying the other users about his/her new public key. The 
MAC that is used in the proposed scheme provides private key 
portability, i.e the ability of user to calculate his/her private key 
any time and from any where, this will be achieved because 
the MAC of the user's secret value is stored into the public 
directory, every time the user needs to recover his/her private 
key, simply he/she downloads his/her MAC from the public 
directory, decrypts it to extract his/her secret number and use 
it to calculate his/her private key(assuming that he/she has the 
system public parameters). 
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Abstract— While X.509 Public Key Infrastructures (PKIs) and 
X.509 Attribute Certificates (ACs) enforce strong authentication 
and authorization procedures (respectively), they do not give the 
user management over his/her own attributes. This is especially 
important in regards to the users' personal information when 
a service provider requests more than necessary, sensitive infor- 
mation such as medical data, and the users need control over 
the attributes they are sharing. We present an Attribute-Based 
Public Key Infrastructure that addresses the management of 
users' attributes and giving more control to the users' concerns 
in identity and access management system and in documents 
signatures. Our user-centric scheme also simplify the confidence 
of the attributes validity and the verification procedures. 

Index Terms — Attribute-Based, Public Key Infrastructure, 
Identity Management, Attributes, User-Centric. 

I. Introduction 

The increase of Service Providers (SPs) available over 
the Internet (e.g., social networks, e-mail, e-commerce, e- 
learnings, multimedia centers) and the facility to be accessed 
through smartphones and others mobile devices, demands 
close attention in the authentication and authorization proce- 
dures to the SPs and users, many Authentication and Autho- 
rization Infrastructures (AAIs) demand the users' registration, 
the storage and management of users' attributes in a database 
or directory. It is important that the attributes' management 
be trustworthy and the attributes can not be used for other 
purposes than what was determined by the owner. 

The X.509 Public Key Certificate (PKC), provides asym- 
metric cryptography functions and the advantage in binding 
a key pair with the subject's specific information. The PKCs 
support a strong authentication method and digital signatures 
[1], [2]. The X.509 Attribute Certificate (AC) enables the 
use of digital certificates for access control and delegation 
functions [3]. The management of a certificate's life-cycle, 
provided by a X.509 Public Key Infrastructure (PKI) or a 
X.509 Privilege Management Infrastructure (PMI) for example 
[4], is criticized because of the amount of processes needed 
to verify the trust of a certificate and the management of 
the revocation procedures [5], [6]. Another drawback to PKI 
is the way that digital certificates are issued, which do not 
allow the owners to switch any personal information within 
the certificates. If any information needs to be changed, then 
a new certificate has to be requested and the previous one 
revoked. This procedure may be costly for the users and for 
the infrastructure. 



The management of access control, roles and permission 
attributes from the user's digital certificates could be solved by 
the use of attribute certificates. While these do provide some 
benefits, ACs can only be used for authorization procedures. 
Attribute certificates can be used together with PKC to provide 
a stronger authentication and authorization mechanism. If 
the AC is linked to a public key certificate, the verification 
procedures' complexity will be increased. Different from other 
AAIs (e.g., Shibboleth [7], OpenID [8]), PKCs and ACs are 
simpler for the entities involved but costly for users. Here 
we consider a way to increase the attributes management 
capabilities from PKCs and ACs while also providing the same 
functions. This would improve identity management, access 
management and digital signatures. 

a) Contribution: The aim of this paper is to propose an 
Attribute-Based Public Key Infrastructure for identity and ac- 
cess management (IAM) and documents signatures to improve 
the management and the disclosure of users' attributes. Our 
model simplifies the way that users disclose their attributes, 
the verification procedures necessary, and the validation of 
attributes and digital signatures by a trusted party. The idea 
is based on the use of asymmetric cryptographic functions 
and self-signed assertions by the user. The assertion, which 
contains attributes claimed by the user, is verified and certified 
by a Notarial Authority (NA). The NA certifies the assertion 
by contacting the responsible authority for the management of 
user's attributes life-cycle. 

b) Outline: We start this paper by describing the diffi- 
culties related to X.509 PKI and PMI, and its usage as an 
authentication and authorization mechanism. In Sect. Ill we 
list related works. Next, we present our proposal, followed by 
definitions and illustrated examples (Sect. IV). More practical 
descriptions of our idea, including a description of the pro- 
cedures that players in our scheme perform are also shown 
(Sect. IV-B). Afterwards, in Sect. V, we describe the analysis 
of the model through the comparison with the X.509 PKI and 
PMI model. Finally, we present our considerations and future 
works (Sect. VI). 

II. Problems 

Public key infrastructure emerged in order to manage the 
life-cycle of public key certificates [1]. PKCs can be applied 
to automated identification, authentication, digital signatures, 
access control and authorization functions in a digital envi- 
ronment. In contrast of the benefits provided by PKCs (the 
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asymmetric cryptography functions), a PKI architecture brings 
some disadvantages. One of the disadvantage is the verification 
procedure of a certificate which a certification path is needed to 
know which certificate authorities participated in the issuance 
of the end user certificate [5]. This may not always be 
performed easily and quickly, thus causing a problem in some 
environments and situations. Furthermore, it can also impact 
the revocation and verification procedures of a certificate. 

To avoid the security of PKCs' functions, a revocation 
procedure needs to be fast, efficient and timely for large in- 
frastructures, reducing the amount of data transmitted [5]. The 
knowledge of the certificate's revocation state needs to be pub- 
lished via certain techniques, e.g., Certificate Revocation List 
(CRL) [1] or On-line Certificate Status Protocol (OCSP) [9]. 
The certificate's validation is composed of verification pro- 
cesses, e.g., checking the certificate integrity, the validity 
period, the key usage (the applicability according to the PKI 
policies), getting the certification path and the revocation status 
of all certificates in the path. The device which is being used 
to verify a certificate requires a minimum of computational, 
storage and networking resources. For signature verification, 
the revocation status needs to be included into the signature. 
Sometimes, these procedures cannot be done off-line or in 
devices with limited resources. 

Before the end of the certificate validity, if anything happens 
to the certificate (e.g., the loss of the private key, the key 
being stolen, a change in information) a revocation procedure 
is needed [10]. Since most attributes for access control, role, 
and permission do not have a long lifetime (i.e., more than a 
certificate valid period), it is not recommended to include these 
types of information into a PKC. Additionally, the use of PKCs 
for access control is not recommended because the certification 
authorities may not be the responsible for the management of 
those users' attributes. In this case, attribute certificates could 
be a solution, however two different infrastructures will be 
necessary to manage PKCs and ACs, increasing the costs, the 
human and computational resources, and the security issues. 

PKCs have the advantage of being supported and imple- 
mented with other authentication and authorization mecha- 
nisms. However, PKI is difficult and expensive to implement. 
It requires a lot of effort for its management and maintenance, 
leaving doubts as to the cost-benefit as regards its function- 
ality [11], [12]. Beyond the problems we have already stated 
above, we explore the following problems related to PKI as 
an identity and access management. 

Problem 1. The management of the users' attributes to issue 
and maintain PKCs (or attribute certificates) is not optimal. 
The use of a PKC discloses some of the user's information 
that may not be necessary in that particular situation. The 
procedure to issue certificates as well as in identity and access 
management, the users' attributes must be managed and stored 
by trusted appropriated entities which are responsible for those 
attributes, avoiding copies and increasing the reliability. Ad- 
ditionally, the users must have more control in the disclosure 
of their attributes by claiming only the necessary ones. 



Problem 2. The amount of procedures and the complexity 
required to implement a PKI and to maintain the PKCs trust 
(and increased with the use of the AC) make a PKI costly to 
the domain and to end-users. The functions provided by the use 
of PKCs (e.g., authentication, authorizations, digital signature) 
should maintains their strength, but reducing the complexity 
and the costs to have the same level of cost-benefits with others 
identity and access systems. 

III. Related Work 

There are many proposals in literature to improve the 
conventional PKI. In this section we present the works in 
regard to the problems described in Sect. II. 

Some works proposed alternatives to improve the certificate 
revocation mechanisms, like the CRL based alternatives which 
attempt to overcome the scalability problem [13], [14]. Other 
works provide revocation data that is smaller and easier to 
evaluate than CRL [15]— [17]. Some works aim to improve 
the existing revocation mechanism [18], while others propose 
a PKI without revocation checking [19]. Faced with various 
revocation mechanisms, both existing and proposed, some 
works aimed to analyze the cost of each mechanism [20]- 
[22]. 

Alternative PKI models and concepts were created to give a 
different architecture of a PKI. Moecke et al. [23] proposed a 
change to the form in which certificates are issued, by creating 
a Validation Authority to replace the responsibility of the 
certification authority and the Time Stamping Authority (TSA) 
function in digital signatures. To reduce the validation difficul- 
ties of a digital signature and the certificate chains, it uses self- 
signed certificates. Another PKI scheme is the SPKI (Simple 
Public Key Infrastructure) [24]. It proposes and simplifies the 
PKI architecture and focus on authorization processes, binding 
one key with a user's authorization [25]. The SDSI (Simple 
Distributed Security Infrastructure) combines the SPKI design 
and the definition of groups to issue certificates to group 
membership [26]. Despite its simplicity, SPKI/SDSI is limited 
because there is no formal bondage of trust between entities 
involved and a member can make an inquiry on behalf of its 
group, for example. 

Focusing on digital signature issues, Custodio et al. [27] 
proposed the issue of a special certificate (optimized certifi- 
cate) to make certificate path verification and digital signing 
more efficient, replacing the signer's certificate and validity. It 
can also substitute the time-stamping service. Vigil et al. [28] 
extended on the work of Custodio et al. by using a new 
entity named "CryptoTime" to publish "Novomodo proofs" 
and presented the comparative costs to store and verify a 
signed document. 

NBPKI (Notary Based PKI) focus on long-term signa- 
tures [29]. Based on the real world of handwritten signatures, 
notaries are responsible for certifying that a signer's certificate 
is trustworthy by verifying a particular signature at a specific 
time. The user issues their own PKC and it is validated and 
certified by a notary. This model simplifies the maintenance of 
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a document's signature and makes the process more intuitive, 
but does not focus on users' attributes management. 

Attribute-Based Cryptography (ABC) is based on Identity- 
Based Cryptography [30], which allows users to decrypt a 
cipher-text by using their attributes and the policies associated 
with the message and the user [31], [32]. The users requests 
their private keys (like in IBC) based on attributes/policies. 
Anyone can create cipher-texts by incorporating attributes and 
policies. One negative point in ABC is the same in IBC, where 
it is necessary a trusted third party to issue the users' private 
keys and also there is the key escrow problem. 

A. Identity and Access Management 

Over the past few years many standards, paradigms, specifi- 
cations, frameworks and softwares have been implemented to 
address the improvement of the AAIs [33]. J0sang and Pope's 
work reports and concludes that the user-centric approaches for 
AAI improves the user experience and the security of on-line 
service provision as a whole [34]. The user-centric paradigm 
aims the user's control at the different aspects of his identity, 
i.e., his "partial identities". 

Another common AAI paradigm is the federated one. 
Normally used for academic federations, (e.g., Shibboleth 
framework based on the SAML standard), is composed by an 
Identity Provider (MP) and Services Providers (SPs), where the 
first is responsible for managing the users' attributes and the 
users' authentication for the SPs [35]. The SP authorizes users 
to access a resource according to the users attributes received 
by the MP All entities in the federation form a "circle of trust" 
and they must agree to the same policies. 

In on-line systems, where IdPs create access tokens on 
demand (e.g., SAML, OpenID, WS-Federation) [35], [36], 
the impersonation of its users and the tracking of user's 
accesses on-line is a possible consequence. Systems with off- 
line token creation, such as X.509 PKCs and some WS-Trust 
profiles [37] force the user to reveal more attributes than 
needed (as otherwise the issuer's signature cannot be verified). 
In order to minimize these effects, it would be desirable to 
make a request to an IdP without the IdP knowing what SP the 
user is accessing, with the use of a signed assertion claiming 
the necessary attributes. 

There are not many systems that support attribute certifi- 
cates. One of them is the PERMIS project (Privilege and 
Role Management Infrastructure Standards Validation) and it 
is an access control management system that complements an 
authentication system [38]. This framework controls access 
by using attribute certificates to store users' roles. All access 
control decisions are driven by the authorization policy, which 
is included in the attribute certificate, thus guaranteeing its 
integrity. The integrity with PKCs is possible, but it will 
increase the complexity. 

IV. Attribute-Based Public Key Infrastructure 

Attribute-Based Public Key Infrastructure (ABPKI) aims 
to manage users' attributes in a way that gives the user 
more control over the disclosure of his attributes to services 



providers for authentication and authorization procedures. The 
model takes advantage of real world notarial responsibilities 
and services and helps users to be more aware of the service 
policies and what attributes are shared with services providers. 
ABPKI also supports the document signature' functions, en- 
abling the appropriate user's attribute choice to be binded to 
the signature and validated by a notary. In this section, it is 
described the entities involved, the procedures' flows when a 
user requests his attributes verified to get a resource to the SP, 
and how ABPKI works in the document signature procedure. 

A. Components 

In this subsection, we define the concepts involved in our 
model. We define two main entities: Attribute Provider (AP) 
and Notarial Authority. ABPKI uses a Trust-service Status List 
(TSL) to manage trusted entities. 

1) Attribute Provider: An Attribute Provider is an entity 
responsible for registering attributes for the user (e.g., name, 
surname, e-mail address, occupation, public key), storing 
the information in a trusted database system, and keeping 
attributes up to date. APs could be the entities already respon- 
sible for registering users' attributes for governmental, profes- 
sional, or even business purposes. Each AP has an asymmetric 
cryptographic key pair to be used in the communication's flow, 
and managed into a Trust-service Status List. 

2) Notarial Authority: A Notarial Authority is a point of 
trust responsible for receiving self-signed assertions from users 
and validating users' attributes. The NA communicates with 
the attribute provider and, if the AP confirms the correctness 
of the user's attributes, the NA co-signs the assertion. This 
procedure certifies the truthfulness of the user's attributes. To 
be defined as a trust authority, each NA has an asymmetric 
cryptographic key pair used to sign the assertions and to make 
the communication secure. The trust of the public keys tied 
to each NA and AP is managed by a Trust-service Status 
List. For document signature purposes, NA certifies the user's 
information binded to the document's signature. 

3) Trust-service Status List: A Trust-service Status List 
provides an assessment structure which has an overseer role 
with respect to trust in the services and their providers. 
TSL makes trustworthy information about services and their 
providers available, along with a historical status and the 
associated public keys [39]. A TSL may be composed of a 
list of TSLs and it is managed, signed, and published into a 
trust public repository by a trusted entity of its domain. 

B. How it works 

For the user to take advantage of ABPKI, he must create an 
asymmetric cryptographic key pair and register his public key 
into each AP's database that already managed or will manage 
one or more of his attributes. The registration could be done 
personally or by a web service. If the AP already has an 
authentication mechanism installed, then the registration of the 
user's public key is done after the user authentication. Then, 
the user authentication mechanism is migrated to the use of 
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asymmetric key pairs. Otherwise, the most secure way is for 
the registration to be done personally. 

The creation of a key pair can be done by an software, 
e.g., a local one (desktop), a web-service provided by the AP 
which the private key must be generated at the user side. The 
keys are placed into a secure device (e.g., smartcard or USB 
token), which the public key can be easily extracted and the 
private one can only be used with a secret (e.g., password, 
PIN). The key pair is not associated with any digital certificate, 
consequently the keys can exist for much longer. However, 
their validity is equally associated with the cryptographic 
algorithm, i.e., when the algorithm is not considered secure, 
the keys created will no longer be useful either. If something 
happens to the user's private key during the time of validity, 
a procedure to change the association with the user's public 
key and his attributes must be executed. 

This procedure requires that the user purchases a code (a 
sequence of characters) from the AP (e.g., by e-mail, paper, 
device) at the moment he registers his public key. This code 
is a One Time Password (OTP) [40], and it is used only once. 
With this code, the user accesses a web service to change the 
association of his new public key. This process will require 
a challenge-response mechanism to confirm the identity of 
the user, like a confirmation question about an attribute value 
known by that AP. We envisage that such an OTP code can 
be initially printed on a paper document (like into a paper 
certification) and handed over to the user. 

C. Getting Attributes Certified 

Let us suppose that a user wants to request a resource 
from an SP. First, the user has to create a data structure 
containing his attributes claimed by himself. We called this 
data structure an Attribute Authentication Assertion (AAA) 
and it is illustrated in Fig. la. An AAA contains: the user's 
public key; one or more tuples of attributes, which a tuple is 
composed by an Object Identifier (OID) of the attribute, the 
attribute's value and a reference (e.g., URI) of the responsible 
AP for that attribute; and an AAA's validity (set by the AAA's 
owner). The user's public key is an identification attribute that 
is associated with other attributes in the AP's database. If the 
user manages two or more different key pair registered in 
different set of AP, the AAA must contain the correct user's 
public key already registered to the corrected AP. The validity 
field is set by the owner (the user) and its default value is a 
week. The structure is signed by the user. 

There are two modes in which a user can utilize an AAA: 
by sending it directly to an SP without being certified by an 
NA, or the user gets the AAA certified first and then sends it 
to the SP. Depending on the service's policies, it may consider 
that the user is trustworthy and has claimed the valid attributes 
values. However, the SP might want to check the veracity of 
the AAA and the user's attributes by sending it to an NA. 

To get an AAA certified by an NA, the AAA has to be sent 
to an NA, represented by 1 in Fig. 2. The NA verifies the user's 
AAA signature using the user's public key and verifies the 
validity. If the signature is correct and the expiration date has 
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not been exceeded, then the NA proceeds the user's attributes 
verification by creating a data structure, called an Attribute 
Validation (AV). The AV's structure is shown in Fig. lb. An 
AV is composed of the user's public key, the attributes' tuples 
(containing the OID, attribute' value and the AP's reference). 
The NA signs the AV with its private key. The AV is sent to the 
AP that is referenced in the tuple of attribute. This procedure 
is shown by 2 in Fig. 2. For each tuple with a different AP's 
reference, a new AV is created by the NA to be sent to the 
respective AP 

As soon as the AP receives the attribute validation request, 
it verifies the NA signature and the truthfulness of the binding 
of the user's public key with the set of attributes' values 
registered in its records. If all the attributes' values are correct, 
then the AP signs the AV and returns only the signature to 
the NA (step 3). The NA verifies the AP's signature of each 
AV (if many) and if all correct, the NA co-signs the AAA and 
returns it to the sender (could be a user or an SP) to be used for 
authentication or authorization processes (step 4). The AAA 
co-signed by the NA means that the NA has verified, along 
with the responsible AP, the veracity of the attributes. If it is 
an SP which requested the AAA validation, consequently, the 
AAA's signature done by the NA is also returned to the user 
by the SP. The user gets the NA's signature of the AAA and 
concatenates with his AAA to be reused in another moment. 

As previously stated, an NA is a trust entity for society. 
However, the same does not necessarily occur with an AP, i.e., 
there could exist an AP that was not delegated by the govern- 
ment and is not obligatorily trusted by all NA. Depending on 
the policy of an NA, the NA might not accept the validity of 
an attribute managed by a certain AP. In these cases, another 
procedure could be done if there is an NA (at least one) that 
trusts in the respective AP. When an NA receives an AAA 
(step 1) that contains an untrusted AP reference (for this NA), 
this NA should look for which NA trusts in that AP and sends 
the user's AAA to that NA (the AM 2 in step 5). To prevent a 
possible loop in this process, every NA manages and publishes 
its own TSL which contains the APs they trust. If there is no 
NA that trusts in that specific AP, the attributes claimed by 
the user can not be verified. The AM 2 communicates to the 
respective AP (the AP 2 in step 6) and if the user's attributes 
were correct (step 7), the NA 2 signs the AV and sends the 
signature to the NA (step 8). As a result, the NA sends the 
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NA 2 's signature to the user (step 9). 




Fig. 2. Workflow to certify an attribute set. 

Each NA manages the trustworthiness with the APs, accord- 
ing to their policies. The AP's information and its public key 
must be included into the NA's TSL if it is trustworthy or was 
once trustworthy. If the AP was once considered trustworthy 
by an NA but is no longer trusted, the TSL must indicate that 
the AP is no longer trustworthy and since when. Every NA 
has its own TSL. A TSL can be composed of a list of TSLs, 
then every NA might have the TSL from other NAs and know 
which APs everyone trusts. We consider that the domain's TSL 
is managed by the point of trust of the domain (e.g., an entity 
delegated by the government of a county). All TSL should be 
signed by the its manager. 

D. Verifying an AAA 

The user can send to a service provider an AAA just self- 
signed or already certified by an NA. If the SP received 
an AAA self-signed by the user, it verifies the AAA user's 
signature and the validity, and the SP can take an action based 
on the user's attributes. If the SP wants to get the AAA verified 
by a notarial authority, the SP should send it to an NA (cf. 
Sect. IV-C). The other method is having received an AAA 
already verified by an NA from the user, which the SP verifies 
the NA's signature included in the user's AAA. The "validity" 
field is also verified by the SP, but it's policy could require a 
fresh AAA. To verify if the user who sent the AAA is the same 
that created the AAA structure, the SP executes a challenge- 
response mechanism with the user in both methods, using a 
public key challenge-response protocol. All communication is 
made using a trust channel, like SSL/TLS. 

E. Signing and Verifying Documents 

The ABPKI can be used to validate and certify document 
signatures with less verification processes than PKI. To certify 
a document signature, the user signs a document using his 
private key. Then he creates an AAA, where he includes the 
document's signature and some other attributes from himself, 
e.g., that he is a lawyer or something that identify him 
as a lawyer. The NA verifies the user's attributes with the 
respective AP and signs the AAA if they are valid. The NA's 
AAA signature (i.e., the AAA co-signature) confirms that the 
signer's public key and the attributes were trustworthy and 
associated with the document's signature when the AAA was 



co-signed. Thereafter, the user attaches the AAA certified by 
the NA with the related document. 

The verification of the document's signature can be done 
in different steps. First, the mathematical verification of the 
document's signature is done by the public key included in 
the AAA. After, the semantic correctness is guaranteed if the 
signer's public key and his attributes were trustworthy at the 
moment that the NA co-signed the AAA. The verifier (who 
receives the AAA from the owner and the correlated signed 
document) gets the corresponding NA's public key (in the 
TSL) and verifies the AAA's co-signature. If the cryptographic 
algorithm used is still valid, then the signer's attributes, the 
signer's public key, the document's signature are still valid and 
certified by that NA. The verifier can calculate the document's 
digest and compare it to the digest in the document signa- 
ture's values in the AAA as an attribute. If all verifications 
for mathematical and semantic correctness succeed, then the 
verifier infers that the document was correctly signed with 
the trustworthy attributes and they existed at the time the NA 
co-signed the AAA. 

F. Use Cases 

The ABPKI aims to maintain the same features than PKCs, 
but with much more facilities for the user. This means that 
when a user wants to prove personal information to a digital 
service, or in a physical place, ABPKI can facilitate the 
requirements for authentication and authorization necessities. 
The user maintains control over his own information, and 
decisions about when, where and who can receive the set of 
attributes. Take, for example, in a health care environment, a 
person that has a disease (e.g., human immunodeficiency virus, 
cancer) and he needs a treatment medication, but does not want 
to be identified by his name or to inform which disease he has. 
This person is allowed to get treatment (e.g., in a drugstore, 
medical center), but he needs to prove some attributes (e.g., the 
medication's name, the dose, the doctor's identifier who pre- 
scribed). The person creates an AAA claiming these attributes, 
sends to the medical center responsible sector, and the medical 
center requests to an NA the verification of the AAA. The NA 
communicates with the related AP (e.g., the hospital that the 
user was diagnosed). If the information was correct, an NA's 
signature of the user's AAA is received and the medical center 
can authorize the person to receive the treatment. The person 
could present an AAA already validated by an NA and the 
verification processes could be shortened and faster. 

Another use case could involve a Driver and Vehicle Licens- 
ing Authority (DVLA) acting as an AP and the user wanting 
to contract a insurance service for his vehicle. The user could 
claim to the insurance company any information about his 
vehicle and the DVLA validates them and he could claim some 
personal name to be verified by a government responsible AP. 
The user signs the contract and the insurance company verifies 
it like described in Sect. IV-D. This company could also give 
to its users some kind of benefits (e.g., a charge discount) 
for those who claim that do not received any penalty during 
last years for example. The DVLA should confirm to an NA 
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the user's claimed information and be validated to be used 
by the company. In this last case, the user does not need to 
be registered or even show his name or other unnecessary 
attributes. 

With the use of asymmetric keys, the user can access SPs 
through a public key challenge-response protocol. An AAA 
loads the user's public key and his necessary attributes to allow 
access to a resource. The ABPKI aims to be used in simpler 
environments, where an X.509 PKI does not fit. The user 
can manage his AAAs through smartphones or other mobile 
devices, in order to be able to apply for access anytime and 
anywhere. 

V. Analysis 

In this section we present the evaluation of ABPKI. For this, 
trust assumptions from X.509 PKI and identity management 
systems are described. 

An NA is a trusted party in ABPKI and all NAs are 
delegated by the government. We assume that the private key 
of the NA cannot be compromised. If something happens to the 
NAs private key, it must be reported to the domain's TSL and 
a new key pair must be created. We assume that the publication 
of an updated TSL is done as soon as the problem was reported 
and resolved. The history of each authority must be described 
in the TSL's records. It is also assumed that the private key 
of the AP is held securely. If something happens to the AP's 
private key, then it must be reported to all NAs and a new 
key pair must be created and registered. The trust of an AP 
is managed by the NA. It may happen that an NA does not 
agree with the AP's policies (e.g., the manner that the users' 
attributes are updated). On the other hand, other NAs could 
agree and accept to validate the attributes managed by that 
AP. It is important that one NA trusts at least one AP. If no 
AP is trusted by an NA, then that AP is considered untrusted 
by the ABPKI scheme. 

In X.509 PKI, it is costly for the end-user to obtain a PKC. 
Moreover, the information included in users' certificates may 
not always be necessary or have the same validity period. 
An X.509 attribute certificate can not resolve these problems 
because it will be necessary to be associated with the user's 
PKC to provide a strong authentication. On the other hand, in 
ABPKI the control is left for the user to decide which attributes 
are disclosed in each environment and situation. The user will 
not have a cost for acquiring a certificate, neither a key pair. 
Otherwise, the NAs can earn something by charging who ever 
requests the verification of an AAA. With this intention, those 
who use the ABPKI functions more must spent more, which 
differs in a PKI model. 

We do not set a specific format type for the AAA, but 
it could be in XML. The SPs should specify a template 
informing to users which attributes are necessary to access the 
resources. The user could manage many AAAs validated by an 
NA, and with a different public key registered in each different 
AP. Despite the advantage of increasing of the user's privacy 
in association with public key and attributes, complexity for 
the user is increased. If a user maintains more than one AAA 



co-signed by an NA and different key pairs, we assume that 
he could use software to manage the files. This software could 
be installed onto a desktop, mobile device, or in the cloud to 
facilitate usability. 

The trustworthiness of the user's information is simplified 
by the verification of the NA's signature in the AAA. If it is 
necessary to authenticate the sender, the public key included 
is used to perform a challenge-response protocol. There is no 
certificate revocation mechanism in ABPKI, because there is 
no certificate. The AAA's validity is set by the user and could 
not be more than one year, however the SP can decide if it 
wants to receive a newer AAA. The user's key revocation is 
done by using an OTP code and registering a new public key 
in the AP's records. This is realized in each AP in which the 
user has an attribute. As a result, all PKI problems described 
in Sect. II do not occur in ABPKI. 

Compared with other AAIs, like the federated ones that 
there must be a "circle of trust" between all HP and SPs, and 
they must keep the same policies agreement. In the ABPKI, the 
point of trust is set by the NAs and how trustful APs manage 
users' attributes. The SPs inform users which attributes are 
necessary and the users decide and claim which attributes' 
values they want to disclose. In some federation systems, 
the IdP just inform the user which attributes the SPs are 
requesting, letting the user accepts the attributes transference 
to get the resource or do not accept and not be allowed to get 
the resource. The IdPs do not allow the user in selecting which 
one he wants to disclose or not. Another issue in the federated 
identity infrastructures is if the user belongs to many different 
IdPs, he has to maintain more than one different authentication 
information. Additionally, if the user's IdPs manage the same 
set of user's attributes, the user's attributes types may not have 
different values. 

In relation to document signatures in X.509 PKI, there are 
limitations in terms of the management of the signer's at- 
tributes in the signature. The signature information is the same 
as what is in the signer's public key certificate. The verification 
procedure is also complex, depending of the validity of the 
signer's PKC. On the other hand, in ABPKI the signer includes 
the necessary attributes for each document signature. The 
verification of the document signature is reduced in ABPKI 
through the validation by a NA. The signatures remain valid 
until the validity of the cryptographic algorithm used becomes 
obsolete and, after that, only another co-signature from a 
notarial authority is required to maintain the validity. 

VI. Considerations and Future Work 

We proposed an alternative public key infrastructure aimed 
at the management of users' attributes. Our proposed model 
keeps the essence of the traditional X.509 PKI and PMI, im- 
proving the usage in identity management, access management 
and document signatures. Instead of digital certificates and all 
the verification processes needed (e.g., certificates in the certi- 
fication path, revocation lists), no public key certificate is used 
in our approach. The notarial authority signature maintains the 
trustworthiness of the user's claimed attributes, simplifying the 
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verification processes and keeping the infrastructure lighter. 
The simplicity provided in ABPKI makes use possible in many 
environments where X.509 PKI would not be possible. 

Based on the real world, the notaries' responsibilities are 
used as a trusted third party to prove users' attributes. We 
can compare an NA to a Root CA (in X.509 PKI) or an 
Attribute Authority (in X.509 PMI), whose private key is used 
to sign end user's certificates. Through this, the NAs makes 
the ABPKI a distributive scheme, which the scalability could 
be increased. 

An attribute provider is comparable to an identity provider. 
Upon the request of an attribute's validation to an AP, the 
users' attributes tend to be updated and trustworthy to be 
validated. No attribute needs to be copied to others APs, 
thus avoiding the problem of replication and incompatibility 
of user's information values that some identity providers 
may suffer. The user-centric paradigm increases the users' 
control and knowledge about which attributes are important 
and necessary for each time an authentication or authorization 
procedure is realized. 

Devices and situations where there are not sufficient req- 
uisites to apply the X.509 PKI functions could work with 
ABPKI. For future work, we suggest a calculation of how 
much simpler ABPKI is to focus in ubiquitous computing and 
environments. Another objective is to improve the anonymity 
of the model, where the public key would not be traceable, 
making the linkage of many AAAs difficult. A solution for this 
could be the use of Anonymous Credentials, which permit the 
owner of the AAA to prove the affirmation of the attributes' 
values without revealing any additional information about the 
user [41]. 
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Abstract — The case of the shortest path searching is 
an issue to get the destination with the efficient time and 
the shortest path. Therefore, some shortest path searching 
system has been developed as a tool to get the destination 
without spent a lot of time. This paper implements the 
visualization of searching result for shortest path of the 
government agency location on the map using ant colony 
algorithm. Ant colony algorithm is an algorithm which has 
a probabilistic technique that is affected by ant 
pheromone. The shortest path searching considers some 
factors such as traffic jam, road direction, departures time 
and vehicle type. The testing is done to obtain the ant 
tracking intensity controlling constant (a) for calculation 
probability of route that is selected by ant and visibility 
controlling constant (P), therefore the optimal route would 
be obtained. The testing result shows that the worst 
accuracy value was reach when a = and p = 0. On the 
other hand, the accuracy value close to 100% on some 
combination of the parameter such as (a = 0, p = 1), (a = 2, 
p = 1), (a=0, p=2), (a=l, p= 2) to (a=2, p = 5). It shows that 
the accuracy value is close to the best result. The change of 
parameter a and p are the main priority on the shortest 
path searching because the values have been produced will 
be used as probability value of pheromone. 

Keywords - shortest path; map visualization; Ant Colony algorithm; 
government agency location 

I. Introduction 

The development of means of transport volume results 
traffic in more compact, especially at certain hours. Congestion 
that occurs is influents in day-to-day activities of the 
community to reach a location with time. Therefore, to 
facilitate community activity then developed a shortest path 
search system so it will not drain a lot of time. One of the 
places that are frequently visited by community is the location 
of government agencies such as the governor's office, 
immigration office, tax office, liaison office and others. 



To obtain the optimal path, much the shortest path 
algorithm developed. One of the algorithms that are often used 
is the Ant Colony. Based on the analysis was conducted by 
Nan Liu et al showed that the algorithm is quite stable against 
changes in the value of the parameter [1]. Ant Colony 
Algorithm is adopted from the behavior of an ant colony, 
known as the ant system to find the path of colony toward food 
source [2]. The path can be found because of the marking 
pheromone by other ants. When the paths found have the same 
distance, then the first route found will be chosen [3]. The 
location search using Ant Colony is influenced by the 
probability of ant pheromones in choosing the desired location. 
The higher the level of probability of ants choose the higher 
path probability of ants will move to that location [4], 

Optimal path information has been generated is often stated 
in the order of street names that must be followed. This will 
make difficulty for residents or newcomers who do not know 
the location of the roads. For this purposes, it is necessary to 
visualize this result on a map. Based on the description, this 
study develops an application seeking the shortest route toward 
government agencies location using Ant Colony algorithm and 
visualize the route on the map. The shortest route search 
considers departure hour, traffic density, vehicles type and road 
direction. 

n. Data AND METHOD 

A. Data 

The data used in this study is a digital map city of Banda 
Aceh municipality in PNG (Portable Network Graphics) format 
and contain attribute data such as location government agencies 
and road data. The government agencies data consist of 
agency_name and agency_id attributes, while road data 
consists of some attributes such as street_id, street names, 
road_distance, road_width, road_direction and traffic density of 
each way per hour. This attributes are stored in XML format 
(extensible Markup Language). The data was fit manually road 
capacity made by the Directorate General Bina Marga [5]. 



19 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



B. General Flow of Application 

In this application there are several stages namely 
initialization process of map data obtained from the XML 
format. After that calculate shortest path is done using Ant 
Colony algorithm. Input of the algorithm is the starting point, 
the point of destination, departure time and vehicle type. Then 
proceed with the showing of resulting route on the map. 
General flow diagram of application is shown in Figure 1. 
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E. Filling Point to Tabu List 

Tabu list contains all the points that have been visited on 
each trip. Filling point to tabu list is done from the initialization 
of point, so the first point of the initialization will be populated 
with the certain index point. Tabu List (k) can contain a 
number of point index beetwen k to n initialization point. 



Start 





Map data 
inisialization 




1 




Ant colony algorithm 




I 




Route visualization 
on map 





Figure 1 General flow of application 



C. 



Ant Colony Algorithm 

Ant Colony Algorithm is a probabilistic computing 
technique that can be used to find the best path. In principle, 
this algorithm mimics ant colony in finding the shortest traces 
from the nest toward food sources. There are some steps of ant 
colony algorithm that are detailed discussed at subsection 
below. The steps and defining parameters are adopted from [2] 
[3] [6]. 

D. Parameter Initialization of Ant Colony Algorithm 

The parameters needed to determine the shortest path is: 

1. Ant trail intensity between the point and the changes 

(Ji) ). This parameter is important in the selection of path 
will be traversed by ants. 

2. Departure point and destination point 

3. The value of ant intensity footprints (feromon) difference 
(Q) 

4. Constant of ant trail intensity controller (a), with a > 0. 
This parameter is used to calculate the probability of the 
route will be passed by ants. 

5. Constant of visibility controller (P), where P > 0. 

6. Visibility between points = l/dy. 

7. The number of ants (m), stating the number of ants on the 
resulting route. 

8. Constant of ant trail evaporation (P ), is the intensity of 
the next ant trail. The value of p should be > and < 1 to 
prevent an infinite trail of ants. 

9. The maximum number of cycles (NCmax), a fixed 
parameter when the program is running. 



F. Trip Route Selection 

The selection of ant trip route to any point is by placing ant 
colony at every point. Then the colony move to points that are 
not available at Tabu (k) for the purpose of further points. Ant 
colonies perform continuous visiting at every point. If the 
original point is expressed as tabu(s) and the other points 
mentioned in {N-taboo}, then the probability of a point can be 
calculated by (1). 



z 



(i) 



for je[N-tabu k } 

Where Pjj is probability of point i to point j, i is i* point; j is j th 
point, T;j is pheromone of point i to point j, P is constant of 

visibility controller, a is ant trail intensity controller, TJ.. is 

visibility from point i to point j and k is number of possible 
paths traversed. 

G. Calculation of Path Distance and Ant Footprints Intensity 
Price Change Between Points. 

The calculation of path distance is done when the ants has 
done one cycle (iteration) and visit every point that does not 
form a close cycle. This calculation is based on the value of 
tabu (k) and using (2). 

n-l 

~ ^ tahu,(nYtnhu,m ^1 ^ tnhu , ( s\tahu , ( @) 
1=1 

The calculation of ant footprints intensity changing can be 
done using (3). 

2_ 

u 



At* 



, (i, j)e source and destinatio n node on tabu k 
A r-j = for the other (i, j) (3) 



a k 

Where ATy is ant footprints intensity changing between 

points, k is the number of ant, Q is ant cyclic constant, and L k 
is sum of all distances to be passed by the ants. 

Ant footprints calculation between points with subsequent 
cycles has certain changes due to evaporation and the 
difference in the number of ants crossing the path. Therefore, it 
needs to calculate the price of footprints intensity on 
subsequent cycles using (4). 



v.. = p.r.. +Az\. 



(4) 
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Where & T is the changing of ant footprints intensity price 

between points, and 9 is a constant of ant footprints 
evaporation. 

H. Discharging of Tabu List 

The tabu list needs to be emptied before entering the next 
cycle. If the number of cycles / iteration (NCmax) and the 
convergence has not yet been reached, this step is repeated 
again until the process is stopped and reaches maximum 
iteration. This process uses the pheromone intensity between 
points that have been improved as the parameters. 

/. The Accuracy of Ant Colony Algorithm 

Base on [3] is obtained values of parameters to achieve the 
optimal solution are a=l, (3=1 to5 and p = 0.5. 

In this study, the values of the parameter to be tested are 
a, p, p and Q. Referring to the previous study [3] determined 
the value of the parameters are: a G{0, 1, 2, 5}, P G {0, 1, 2, 
5}, p G {0.3, 0.5, 0.7, 0.99} and Q € {5, 10, 20}. 
The accuracy of Ant Colony algorithm can be calculated by 
using [5]. 

best (5) 



Accuracy = 



best + worst 



Where best is the number of experiments close to the best 
solution and worst is the number of experiments close to the 
worst solution. 

J. Completion Shortest Part Base on Travel Time 

Solving the shortest route is a problem to obtain a route 
with the shortest weight or have a fastest journey time. To 
achieve the destination with the fastest time is influenced by 
several constraints such as traffic density. 

The process to obtain journey time of the shortest path 
need distance between points (s). Here, S is the distance 
between points on each route of S!,S2,S3...s k . To calculate the 
total cost field by using (6). 

Total cos t field = (D, + D 2 +.... + D n = £ D ( 6 ) 



III. RESULT AND DISCUSSION 

The interface of application is shown in Figure 1. Part A is 
used to select the point of origin and point of destination, 
vehicle used and the time of departure. In the B section shows 
the area where to fill in the values of the parameters used in the 
calculation. In section C there is a button to do the search 
process of shortest path. In section D are shown the 
investigation results of route to be traversed by the shortest 
distance and time required. On the E displayed the best path 
search results on the map. 




Figure 1 . The user interface of the application 

In this research, testing is done to determine the 
effect of a and p values to find the distance changes and to 
find out the accuracy of Ant Colony algorithm. 

The first test is done in a combination of a and p values to 
get the best value combination. Testing performed at the Q = 
1, p = 0.1, cycle = 10 with a combination of parameter values 
a C {0, 1, 2, 5}, P C {0, 1, 2, 5}. In this study, each test with 
different combination a and p were done five times. The 
example of tested route searches was performed with origin 
point is Jln.Tengku Muda and destination point is the office of 
liaison agency at Jin. Mayjend. Hamzah Bandaharal. 



Where D is the selected route and N is the number of distances 
between points/segments. 

Once the distance is calculated, then the travel time taken 
by the route that considers vehicle speed is also calculated. 
Vehicle speed is denoted by V average , i.e. the average of vehicle 
speed (km/h). To calculate the fastest travel time (hours) can 
be used (7). 

<- ,,^r^ distance (1 \ 

Fastest cos t field (CF) = t= ( ' > 

speed 

While the total cost field can be calculated by using (8). 

II 

Total cost field = (E l + E 2 +....+ E n = ^E ( 8 ) 
Where E is selected route at the fastest analysis 



TABLE 1 THE TESTING RESULT AT q £ {0, 1, 2, 5} AND P =0 



Testing 


p=o 


a=0 


a=l 


A=2 


a=5 


1 


1,396 


1,498 


1,498 


1,067 


2 


1,067 


1,722 


1,362 


1,53 


3 


1,396 


1,067 


1,396 


0,689 


4 


1,881 


1,458 


1,53 


1,498 


5 


1,458 


1,436 


1,53 


1,586 



TABLE 1 shows the results of testing with test parameters 
a G {0, 1, 2, 5} and p = 0. Test results shows shortest path 
about 0689 km is obtained at 3rd experiment with the value a 
= 3 andp = 5. 
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TABLE 2 THE TESTING RESULT AT (X € j 0, 1,2,5} AND P =1 



Testing 




B= 


=1 




a=0 


a=l 


a=2 


a=5 


1 


1.067 


1.601 


0.595 


1.067 


2 


1.067 


1.897 


1.067 


1.067 


3 


1.067 


0.595 


1.067 


1.067 


4 


1.067 


0.595 


0.987 


1.067 


5 


1.067 


1.067 


0.696 


1.396 
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produced an accuracy of 20%, whereas when a = 2 and p = 
accuracy decreases in the value of the worst. It also can be 
seen from the table that the Ant Colony produces the best 
accuracy approach on the optimal path at value of a = and p 
= 1; a = 2 and p = 1; a = and p = 2, a = 1 and p = 2 to the a 
= 2 and p = 5. 

TABLE 5 ACCURACY TEST RESULTS OF ANT COLONY 



TABLE 2 shows the results of testing with test parameters a 
€ {0, 1, 2, 5} and p = 1. This test shows that the shortest 
distance is 0595 km were found at the first test with a = 1 and 
P = 2; third test with a = 1 and P = 1, also at the forth test with 
a = 1 and P = 1. 

TABLE 3 THE TESTING RESULT AT (X € {0, 1, 2, 5 } AND P =2 



Testing 


P=2 


a=0 


a=l 


a=2 


a=5 


1 


0.595 


1.067 


1.067 


0.595 


2 


0.595 


0.595 


0.595 


0.595 


3 


0.595 


1.067 


1.067 


0.595 


4 


0.595 


0.595 


0.987 


0.595 


5 


1.067 


0.595 


0.595 


0.696 



TABLE 3 shows the results of testing with the value of the 
test parameter a G {0.1.2.3.4.5} and P = 2. The test result 
shows that most of the combinations produce the shortest 
distance of 0.595 km. 

TABLE 4 THE TESTING RESULT AT (X € {0, 1,2,5} AND P =5 



Testing 


p=5 


a=0 


a=l 


a=2 


a=5 


1 


0.595 


0.595 


0.595 


0.595 


2 


0.595 


0.595 


0.595 


0.595 


3 


0.595 


0.595 


0.595 


0.595 


4 


0.595 


0.595 


0.595 


1.601 


5 


0.595 


0.595 


0.595 


0.595 



TABLE 4 shows the results of test at the value of the 
parameter a G {0.1.2.3.4.5} and P = 2. Based on The test 
results shows almost all of combination produces the shortest 
distance of 0.595 km. This suggests that the combination of 
this parameter approach stable ant algorithm. 

Based on the results of tests on several combinations of 
parameters a and p above, it can be seen that the Ant Colony 
produces the same shortest distance when the p value of 1 to 5, 
and stable approach when p is worth 5. 

Testing the accuracy of the results is represented by the 
best and the worst of the respective parameters have been 
tested. Accuracy of the test results in this study is shown in 
TABLE 5. Based on the test results can be seen that changes in 
the parameters of Ant Colony algorithm affects the accuracy 
resulted. At the time of the parameter a = and p = 



Parameter 


Accuracy 


A 


p> 
r> 


n 




ZU IV 


i 
i 


n 


1C\C/, 


z 


n 
u 


\j /c 


5 


o 


IC 





1 


100% 


1 


1 


60% 


2 


1 


100% 


5 


1 


80% 





2 


100% 


1 


2 


100% 


2 


2 


100% 


5 


2 


100% 





5 


100% 


1 


5 


100% 


2 


5 


100% 


5 


5 


80% 



IV. CONCLUSION 

In this study, the Ant Colony algorithm is implemented to 
search the shortest path of Government Agency location and 
visualize this path. Representation of the algorithm on the map 
is by taking the coordinates on the image map as a reference 
point. Based on the test results are known the best accuracy 
obtained on combined value of a = and p = 1 ; a = 2 and p = 
1; a = and p = 2, a = 1 and p = 2 and with a = 2 and p = 5. 
While the worst accuracy obtained when a = 2 and p = 0. 
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Abstract — Multipath routing is the use of multiple potential paths 
through a network in order to enhance fault tolerance, optimize 
bandwidth use, and improve security. Selecting data flow paths 
based on cost addresses performance issues but ignores security 
threats. Attackers can disrupt the data flows by attacking the 
links along the paths. Denial-of-service, remote exploitation, and 
other such attacks launched on any single link can severely limit 
throughput. Networks can be secured using a secure quality of 
service approach in which a sender disperses data along multiple 
secure paths. In this secure multi-path approach, a portion of the 
data from the sender is transmitted over each path and the re- 
ceiver assembles the data fragments that arrive. One of the larg- 
est challenges in secure multipath routing is determining the se- 
curity threat level along each path and providing a commensu- 
rate level of encryption along that path. The research presented 
explores the effects of real-world attack scenarios in systems, and 
gauges the threat levels along each path. Optimal sampling and 
compression of network data is provided via compressed sensing. 
The probability of the presence of specific attack signatures along 
a network path is determined using machine learning techniques. 
Using these probabilities, information assurance levels are de- 
rived such that security measures along vulnerable paths are 
increased. 

Keywords-component; Mutli-path Security; Information 
Assurance; Anomaly Detection. 

I. Introduction 

Typical network protocols select the least-cost path for 
routing data to destinations and thus address delivery efficiency 
along a single network path. On networks using single-path 
routing, attackers can launch attacks upon any link which seri- 
ously compromises data integrity, availability, and confidenti- 
ality along the path. Network countermeasures required along a 
compromised path include TCP resets of the offending attack 
node or nodes and involves disrupting the flow of traffic on the 
path for a period of time, and switching to an alternate path. 
Nevertheless, deploying these countermeasures generally re- 
quires manual intervention and an associated switching time 
[1]. Having multiple paths available for traffic propagation 
hinders an attacker's ability to focus the attack on a single rout- 
ing path. However, multipath traffic propagation conversely 
introduces complexity into the system: using multiple paths 
requires sophisticated packet-reordering methods and buffering 
methods [2], [3]. In a fully secure multipath network a sender 



simultaneously transmits data over multiple paths with varying 
levels of security enabled along each path. The level of security 
along each path should reflect a measured threat level on the 
path and be dynamically adjusted as the attack environment 
varies. 

Despite the importance of associating and adjusting the se- 
curity level to each path in multipath routing, existing multi- 
path routing protocols such as Multipath TCP lack the ability to 
actively determine the level of security threats along a path [4], 
[31]. ' 

In this paper, we present a novel approach that utilizes 
compressed sensing (CS) [13] and machine learning techniques 
to determine the information assurance level of network paths 
in multipath networks. Compressed sensing (CS) allows net- 
work data to be optimally sampled below the normally required 
Nyquist 2X signal sampling frequency while simultaneously 
compressing data and lowering data dimensionality. CS data 
compression enables the storage of large data windows by up 
to a factor of 10X. The combination of data compression and 
data dimensionality reduction effectively filters out non- 
contributing network traffic features which increases the effi- 
ciency and data handling capabilities of anomaly detection al- 
gorithms used in network path security determination. 

Compared to other types of multipath network security 
methods, the proposed approach is based on recognizing real- 
world attack patterns within compressed and dimension re- 
duced data sets. Additionally, most multipath security schemes 
are based on hypothetically derived trust models while the pro- 
posed approach finds the likelihood of the presence of real- 
world attack patterns in data event windows and assigns infor- 
mation assurance levels to paths that can be subsequently uti- 
lized to actively adjust path security measures [1-8]. 

The remainder of this paper is organized as follows. We 
provide in Section II a review of related work. Section III 
presents the compressed sensing - signature cluster path secu- 
rity determination methods. In section IV evaluation results 
are presented, and finally in section V conclusions are dis- 
cussed. 

II. Background 

In multipath routing, data is transmitted along multiple 
paths to prevent fixed unauthorized nodes from intercepting or 
injecting malicious data onto a network. Ideally, the simplest 
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form of multipath routing entails using no encryption and data 
is split among different routes in order to minimize the effects 
of malicious nodes. The approach in [5] uses existing multiple 
paths such that an intruder needs to spread resources across 
several paths to seriously degrade data confidentiality. In the 
approach of [6], one path is used as a central path while the 
other paths are alternatives. When the central path's perfor- 
mance is seriously affected, one of the alternative paths is se- 
lected as the new central path. These two multipath protocols 
base the effectiveness on the ability to either disperse data 
along multiple paths or in having the option to switch to alter- 
nate paths. However, none of the approaches suggests an ade- 
quate or explicit means for combining dispersive data security 
methods with path differentiating data security measures. 

The differentiating approach proposed in this paper is to in- 
telligently sense the threat level present along each network 
path and correspondingly increase the encryption strength on 
more vulnerable paths while decreasing it on the less vulnera- 
ble ones. In order to maintain overall throughput, the transmis- 
sion rates on more vulnerable paths will drop, while it will 
increase on the less vulnerable ones. The proportional multi- 
path encryption and routing approach is expressed in Eq. (1) 
and maintains a secure quality of service (SQoS). Packets are 
proportionally routed over paths P, and Pj according to values 
/, C, E over graph edges, which are defined shortly. 

III. Network Path Security Determination 

Given a network, let / be the information assurance fac- 
tor, C be the link cost factor (i.e., OSPF cost), and E be the 
encryption scaling factor. For distinct edges or links in a net- 
work, the values of these factors are different. To differentiate 
the factor values on different links, we use subscript i to de- 
note the factor value for an edge e ; . E.g., /, is the information 
assurance factor for an edge e,. Given a message with length L, 
we need to formulate the throughput for sending this message 
from a source node v s to a destination node v e when using mul- 
tipath routing by leveraging these factors. In general, if all 
paths that are used to send a message is LP I, and the length of a 
path Pj is n„ then the throughput is defined as follows. 
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Then, its throughput is: 



T —I y' p ' Y n ' hC-F- 
1 v s ^v e u Li = iLj = \ Hj^ij^ij 



(1) 



For example, assume that the network routing algorithm 
decides to use two paths P t = path (vr, v 6 , v 3 , v 4 , v 2 ) and Pj = 
path (v;, v 6 , v 5 , v 7 , v 2 ) to send a message with length L from Vj 
to v 2 in Figure 1 . 




T^ v2 = L ■ Hf^/j Cft + L CjEj 



(2) 



The throughput to destination vertex "v e " is maintained, but 
the encryption "£"' scaling factors are dynamically adjusted 
according to the values of the information assurance factor / 
over each edge. 

It will be shown that the information assurance factors / 
along a path can be derived by finding the likelihood of the 
presence of attack signature patterns within a defined event 
window of network traffic (Section III.D). Link encryption 
factors E and link cost factors C are inversely proportional to 
the value of information assurance factors /. Derivation of 
factors E and C in maintaining SQoS and throughput T is 

reserved for future research. 

Our approach determines the security levels of network paths 
by examining the traffic data with different temporal parti- 
tions. In particular, the network traffic is partitioned into event 
windows where each window collects data over 30 minute 
sampling periods. For each 30-minute event window, we col- 
lect N sample from the network traffic for a single path. 
For each event window, our approach performs traffic sam- 
pling, anomaly detection, and path security determination as 
shown in the diagram of Figure 2. 



Traffic Sampling 
(CS) 



Anomaly Detection 
(CS-PCA Based) 



Determine Path Security 
(Cluster Analysis) 



| Ml | | M4 | | M15 | 



Y 

Find active features of 
interest. 

Find likelihood of the 
presence of attack signa- 
tures. 

Determine information 
assurance factor on path. 



Figure 2: Processing Flow for Traffic in One Event Window 

As Figure 2 shows, compressed sensing (CS) [13] is used 
to optimally sample network traffic data and store them in a 
compressed form (Section III.A). Behavioral anomaly detec- 
tion is conducted on the CS data (Section HI.B and Section 
III.C). The compressed data are passed to the path security 
determination process (Section III.D), which performs cluster 
analysis on the traffic samples. Significant clusters are in- 
spected for the presence of active attack signature features, 
and the likelihood of a respective cluster containing attack 
signatures is calculated (Section III.D). Given the likelihood of 
specific attack features being present on a path P h the cyber 
threat level W r and information assurance f, are determined 
using Eq. (13) and Eq. (14), which are discussed in Section 
III.E. In what follows, we discuss every step of Figure 2 in 
detail. 



Figure 1: Multipath Graph 
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A. Traffic Sampling 

Network packet header, network time, port, protocol, and 
flags are collected at each router interface. An event window 
corresponds to a set of TCP/IP packet records for a path transit- 
ing a set of subnets or virtual LANs (VLAN) contained within 
an autonomous system. 

For each event window, we first define several notations 
used in the process of data sampling and the later discussions. 

• /: one feature that is abstracted from the network packet 
records. When there are multiple features, we also use/j- to 
denote the i-th feature. 

• A^: the total number of features that are of our interest. In 
this research, a total of 19 features were extracted from the 
captured network packets. These features correspond to a 
specific subset of TCP, ICMP, HTTP, OSPF protocol 
states which are most often associated with router and host 
attacks. 

• X/. samples for a feature / in one event window. It records 
the number of samples of feature / at different sample mo- 
ment. If there are N sample moments, then X is an N- 
dimensional column vector. 

• N: the number of samples that we take for one event win- 
dow. 

• = (X h X 2 , ... X N f)\ a NxlS? matrix with the N samples that 
are taken for an event window. Here, X, is represented as a 
V-dimensional column vector. 

Three separate network based suits, namely reconnaissance, 
vulnerability scanning, and exploitation, were used in emulat- 
ing real-world host and network conditions. Each suite pos- 
sesses a unique signature (S r ). The threat level (W r ) is assigned 
to each attack suite type, which ranges from 1 for least severe 
to 5 for most severe. Table 1 shows the detailed information of 
the network attack suites that we used in this research. 



Table 1 


Network Attack Suites 






Suite 


Description 


Active 


Threat 


Signature 




Features 


Level 


(S r ) 






(Wr) 


1 


Cloud Guest Recon- 


6 


3 




naissance, Vulnera- 


[fl.f2.f3. 






bilities & Exploita- 


f4,f9,fll} 






tion 






2 


Cloud Infrastructure 


6 


5 




Reconnaissance, Vul- 


[fi.fs.fa. 






nerabilities & Exploi- 


f7.fs.f9) 






tation 






3 


Cloud Services Re- 


5 


4 




connaissance, Vul- 


[f1.f5.fs, 






nerabilities & Exploi- 


f9,fw) 





tation 



Both compressed and uncompressed data in event windows 
were used in the analysis. Compressed data in event windows 
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were created by sampling the uncompressed data in the corre- 
sponding event window. 

B. Data Compression Using Compressed Sensing 

Compressed data for each event window are calculated us- 
ing the CS technique [12, 13]. The theory of the CS technique 
is explained as follows. 

Compressed sensing relations are listed below. For the ob- 
served data x £= R N with Q representing the number of non- 
zero elements. The value of Q is determined by finding those 
vectors where the sum of the absolute values of the columns is 
minimum. This otherwise known as the L-l norm and repre- 
sented by min x \\x\\i. 

min x llrvrll x subject to Y = U„x (3) 

In Eq. (3) U v e R MXN is an MxN orthogonal matrix called the 
sensing or measurement matrix, v is a random subset of the row 
indices, and Y is the linearly transformed compressed data. 

We note that the \V\=M and dictates the level of compres- 
sion which is afforded when the linear transformation is per- 
formed on <t> 

W\> Const - ii 2 (U>g- logV (4) 



u(U) = max tj \U tJ \ (5) 

Y = U v x is a linear transformation reducing the data dimen- 
sionality from N to M with U v columns normalized to unit 
norm. If the sparseness of x is considered, a dimension k repre- 
sents the components with high variance, and M is chosen such 
that M > k. 

From Eq. (4), the CS sampling rate which yields the best 
results is captured in Eq. (6) where £ is a constant proportional 
to the number of active features and M is the number of sam- 
ples to be taken. 

M — £ * ^fN * log A/ (6) 



C. Anomaly Detection 

The previous step calculates the compressed data Y from the 
original traffic data <P for each event window. Our anomaly 
detection component detects the event windows that may con- 
tain traffic with anomalous behavior. In this section, we first 
describe the detailed steps of this component. Then, we explain 
the theory behind each step. 

The anomaly detection component works as follows. The 
first step performs Principal Component Analysis (PCA) on the 
compressed data from one event window. I.e., PCA is per- 
formed on a covariance matrix of Y. The second step applies a 
residual analysis over the original data and calculates a 
squared prediction error. If the prediction error for one feature 
is bigger than a threshold, then that event window is considered 
to contain anomalous behavior. 

The compressed event window is represented by Y in Eq. 

(7). 
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(7) 



The sampling matrix U v projects x to a residual subspace; 
however, eigenvalue decomposition of the covariance matrix of 
x and Y yields near-identical eigenvalue magnitudes from 
which anomaly detection can be derived [10]. This fact allows 
one to inspect the compressed data samples, Y, for anomalies 
reducing the computational complexity to (9(M 3 ) and storage 
of the data to 0(M 2 ) with M = 0(k log N). This is a substan- 
tial running time reduction of the PCA analysis over the covar- 
iance matrix of x, which requires 0(N 3 ) computations and 
memory storage of 0(N 2 ). 

A residual analysis method [9] decomposes the observed 
data x (in our case, one row vector in <P) into principal sub- 
space which is believed to govern the normal characteristics. 
Within the residual subspace in which Y resides, abnormal 
characteristics can be found. The residual method performs the 
eigenvalue decomposition of the covariance matrix of x from 
which k principle eigenvectors E are obtained. The projection 
of a data instance x onto the residual subspace is 



(I 



EE T ) x 



(8) 



Assuming the data is normal, the squared prediction error 
is \\z\\\ which follows a non-central chi-square distribution. 
Anomalous activity is detected when the squared prediction 
error llzll| exceeds a certain threshold called Q-statistics which 
is the function of the non-principle eigenvalues in the residual 
subspace and is approximated by 



Qp = 0i 



29 2 h£ 



+ 



e 2 h (h -i) 



(9) 



where h — 1 



201 63 



XyLpA) for i = 1, 2, 3, c p 



6 2 . "1 ^j= P - 
(1 — /?) percentile in a standard normal distribution and Qp, 
and Xj, i - 1, k are the eigenvalues of the covariance matrix. 
Anomalies are detected when the prediction error \\z\\\ > Qp. 
[9] 

D. Determination of Path Security 

Using the approach discussed in Section III.C, volume- 
based anomalous behavior within an event window is identi- 
fied. Such anomalous behavior provides an indication that an 
attack may exist within this event window. If anomalous be- 
havior exists in an event window with high probability, then 
this component attempts to determine the security level for 
paths in this event window by using hierarchical clustering 
techniques described in this section. 

Agglomerative hierarchical clustering was chosen as the 
method for deriving anomalous and baseline models because of 
its ability to identify clusters without providing an initial esti- 
mate of the number of clusters present. Agglomerative hierar- 
chical clustering algorithm iteratively groups data points or 
clusters of points to form new clusters. Each iteration results in 
the previously found points and clusters being clustered with 
another point or cluster. Generally, the results of hierarchical 
clustering of sizable data sets are a large number of clusters, 
many of which contain a small fraction of the samples. A 
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straightforward approach to prioritizing clusters is to eliminate 
the minor clusters by cutting the hierarchical dendrogram lower 
tiers. 

Once clusters are identified in an event window (Algorithm 
1), determination of which clusters contain attack signature 
features of high magnitude is conducted (Algorithm 1). The 
path information assurance factor is calculated (Algorithm 2). 

In order to lower computational complexity, only those 
event windows found to have volumetric anomalies in the re- 
sidual subspace are used in determining network path security. 
The relative magnitudes and spectral properties of each feature 
in principal subspace are calculated, and the uncompressed 
form of each anomalous event window is analyzed. A signature 
consisting of a distinct collection of significant features is asso- 
ciated with each attack suite; thus, the nature of significant fea- 
tures contained within the traffic data of an event window is 
captured in a hierarchical clustering as illustrated in Figure 3. 




Figure 3: Hierarchical Clustering of Data Event Windows, vertical 
axis distance, horizontal axis cluster number. 

Algorithm 1 is used to generate the clustering dendrogram 
as illustrated in Figure 3. Algorithm 1 implements a modified 
hierarchical agglomerative clustering algorithm that merges 
clusters until a minimum distance threshold between clusters 
is reached or all the clusters are merged to only one. When a 
minimum distance threshold is used, the algorithm ensures 
maximum partitioning of data into feature-rich clusters and 
increases the probability that the top-tier clusters contain a full 
attack signature feature set. 

Algorithm 1 takes as input (1) SSiganad,, the set of attack sig- 
natures, each of which consists of several features, (2) <t> with 
the N samples, where each sample is a row vector in <f>, and 
(3) S, the distance threshold to stop cluster merging. This algo- 
rithm groups the samples in one event window to c clusters 
(Lines 3-8). Then, for each cluster, it finds the attack signature 
that has the highest probability to match the cluster's features 
(Lines 10-21). The signatures and matching probabilities for 
all the clusters are put into SSig and PProb respectively. This 
algorithm outputs a triple (C, SSig, PProb) where C[i] is the i- 
th derived cluster, SSig[i] contains the attack signature with 
the highest probability (in PProbfi]) to match C/7/'s features. 
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In this algorithm, c represents the total number of clusters 
found so far, and is initialized to (Line 2). D, is a cluster that 
is being processed. Initially, D, is initialized to contain the i-th 
sample in one event window. C is the set of clusters finally 
derived, and is initialized as an empty set. 

Algorithm 1: SignatureMatchProb(SSig attack , 0, 8) 
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fi increases when attack suite traffic is injected into the net- 
work. 



{Dj, . . ., Dn) where D, is the i-th sample; 



1 begin 

2 initialize c = 0;C={};D = 
do /*merge cluasters*/ 

c = c + 1 

find the two nearest clusters D, and Dj from D 
merge D, and D, to a new cluster and insert the new cluster to C 
until distfA, Dj) > S 
end do 

i = 0;PProb={];SSig={}; 
I* for each cluster, find the attack signature 

which matches this cluster's features with the highest probability*/ 
for each cluster D, ■ ( e C) do 

k=0; maxPro£>,=0; maxProbSig, =( }; 
for each attack signature St (e SSigattack) 

H = extract features from D, that also exist in St 
Ni=# of Hi feature with conditional entropy higher than H(F) 
N k = # features in St 
if(NjlN k >mzxProb,) 

maxProft,= NJ N t 
maxProbSigi = St 

end if 
k = k+\ 
end for 

PProb = PProb u {maxProi, } 
SSig = SSig u {maxProbSig,} 
i = 
end for 

return (C, SSig, PProb); 
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11 
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13 
14 
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28 end 



Lines 3-8 merges samples to c significant clusters. This cluster 
merging process stops when the minimum distance between 
two nearest clusters exceeds the distance threshold 8. In find- 
ing the nearest clusters from all the existing ones, both Ward 
and complete linkage methods can be utilized. Past research 
[20] showed that the complete linkage method Eq. (10) yields 
the best ratio between the within group sum of squares 
(WGSS) and between groups sum of squares (BGSS); thus, 
indicating tighter grouping between inter-cluster members and 
optimal cluster-to-cluster spacing. 



dist (Z)j, Dj) = max{d(xj, xj): x t £ D,, Xj E D, ] 



(10) 



For each cluster D, out of the c clusters in C, Lines 10-21, 
calculate the probability that its features match every attack 
signature. The details of the signature matching are omitted in 
the algorithm for simplicity purpose. We discuss the details 
here. Identifying feature matches is performed by measuring 
the entropy for each feature within an event window. As Fig- 
ure 4 indicates, the entropy of an individual significant feature 




Baseline 




F = [Prom, Frobif,),. . ., Prob(fj)P 




F> = [Probtf't), Problf',),. . ., Prob{f' t )Y 




Figure 4: Features and entropy relationships. 

The average entropy for features in the baseline traffic H(F) 
is 



AO 7 ) = " Z^Probitflog Prob(fj) 



(11) 



The average conditional entropy for features in anomalous 
traffic is 

ZNf 
Prob(fj\f k )logProb(fj\f k ) (12) 
;=i 

It is important to only classify valid clusters. Only clusters 
with active feature frequencies greater than 2% of the total 
number of samples in an event window become candidates for 
classification. In addition, of the candidate clusters those with 
the largest cophenetic distances and highest inconsistency fac- 
tor are selected for feature entropy comparisons. 

A feature f' k appearing in anomalous traffic is significant 
if H(F'\f' k ) is greater than H{F). Significant features are sub- 
sequently used in determining the probability that a cluster is 
associated with a specific attack suite 
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Figure 5: Attack Signature Matching Probabilities 

Let us look at an example for signature matching proba- 
bilities. Assume that we are given an attack signature S k (e 
SSiganack) which consists of four features, //, f 2 , fs, fg, i.e., S k 
={fi> fi, f3, f9~\- Algorithm 1 finds clusters and extracts the fea- 
tures that exist in 5^. As shown in Figure 5, for candidate clus- 
ters Dj, D 2 , D 5 and D g , the algorithm founds matching features 
H, ={f„ f„ /,}, H 2 ={f h f 2 , f 3 ], H 5 ={f b f 2 , f 3 ], and H 9 ={f h f 2 , 
fs, fg}. They all have high (>=75%) feature matches to S k . 
Among these four clusters, D g has the highest feature match 
probability (100%). I.e., all the four features in the attack sig- 
nature S k exist in cluster. 

After calculating the signature matching probability 
(Lines 13-16), the attack signature with the highest feature- 
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matching probability is recorded by maxProbSigj (Line 18) 
and the matching probability is recorded by maxProbj (Line 
17). When the highest matching probability maxProbf and the 
matching attack signature maxPmbSigi for each cluster is 
found, they are put into sets PProb and SSig, respectively 
(Line 22-23). 

E. Calculate Assurance Level for a Path 

Once the set of clusters C are derived and probabilities of 
the presence of specific signatures in those clusters {PProb 
and SSig) are calculated, the path information assurance factor 
/, for network path P, is calculated using Eq. (13) and Eq. (14). 
Based on domain specific cyber security threat models [30], 
for each cyber threat level W h and a corresponding traffic 
threat signature 5, present in an event window, the likelihood 
of cyber threat signatures being present is high if both W, and 
Prob(Si) are high. For a path f, which consists of c clusters 
(discovered in the previous step), we can sum up the threat for 
each cluster (Eq. (13)). Then the information assurance factor 
/, for Pj is derived using Eq. (14): 

O^ZUWt.ProbiSt) (13) 



) International Journal of Computer Science and Information Security, 

Vol. II, No. 11, November 201 3 
Algorithm 3 summarizes the complete algorithm to calcu- 
late the information assurance measurement / for each event 
window. It takes three parameters as input. The first parameter 
is the NxN f sample matrix <t> for an event window. Its traffic 
data is composed of the baseline traffic and anomalies. The 
second parameter is the set of attack threat signature SSig attack . 
It consists of S signatures. The third parameter is the threat 
level W r . 

Algorithm 3: PathlnfoAssurance(<t>, SSig allach W r ) 

1 U v <— GetSensingMatrix(N, M) 

2 Y <- CSSample(U, M, 0)1* Section III.A*/ 

3 {Z,, Z Nf } <- DetectAnomalies(0, Y) /* Section HI.B*/ 

4 if 3Z,(e{Z ; , ..... Z Nf }) s.t. IIZjll| > Q p then 

5 (C, SSig, PProb) <- SignatureMatchProb (0, SSig amch 5) /*Alg. 1*/ 

6 I <- PathlnfoAssuranceLeveKC, SSig, PProb, W,) /*Alg. 2 */ 

7 Store(Y) 

8 else 

9 Store(Y) 

10 end if 



'« = T f (14) 

Algorithm 2 calculates the information assurance factor / 
for a path by utilizing Eq. (13) and (14). 



Algorithm 2: PathInfoAssuranceLevel(C, SSig, PProb, W r ) 

1 initialize = 0; i=l; 

2 begin 

3 for the i'-th cluster D, in C 

4 Calculate its corresponding threat level W t using 

its attack signature S, (e SSig) and threat level W r 

5 Get its highest feature-matching probability Probi ( e PProb) 

6 0=0 + W,x Prob, I* According to Eq. ( 1 2) */ 

7 ( = ;'+ 1 

8 end for 

9 7=1/0 /* According to Eq. (14) */ 

10 return/ 

11 end 



The path information assurance factor /, is calculated in Line 
9. 



The algorithm works as follows. In Line 1, the sampled da- 
ta (feature frequency matrix) <t> is used to generate a CS sens- 
ing matrix U v (, which is a MxN matrix). In Line 2, <P and the 
sensing matrix U v are multiplied to produce Y, an MxN f matrix 
using Eq. (3). In Line 3, the volume-based anomaly detection is 
performed on each column of the Y matrix, and the correspond- 
ing prediction error llzll| is returned. In Line 4, if there exists 
any feature's prediction error llzll| >Qp, it means that there is 
the likelihood that attack signatures are present in <P. Then, <t> 
is analyzed for the presence of attack signatures (Line 5). Oth- 
erwise, the event window that produces <P is determined to 
have a low probability of containing malicious content and is 
stored in compressed form for possible future analysis. In Line 
5, the signature-matching component is called to determine the 
probability of the presence of specific attack signatures in this 
event window. It produces the attack signatures SSig, data clus- 
ters C, with highest signature presence of PProb. In Line 6, the 
information assurance value / for a path P, is calculated using 
the set of signatures SSig, and their corresponding matching 
probability (in PProb) to attack signatures. 

G. Complexity and Efficiency Gains 



F. Summary of Methodology 

As mentioned in Section III.A, a total of 19 features were 
extracted from the packet records. Network traffic, which con- 
sists of the packet records, is partitioned into 30-minute event 
windows. For each 30-minute event window, a traffic feature 
frequency matrix <t> is extracted to contain the samples of the 
19 features. 



The overall computational complexity of PathlnfoAssur- 
ance expressed in Big O notation is as follows. Let N be the 
number of samples, 9 be the number of non-sparse compo- 
nents, c be the number of clusters, S be the number of attack 
signatures, andM = 0(6 log N). 

GetSensingMatrix 0(N 2 \ogN) 
CSSample 0(iV 2 - 373 ) 
DetectAnomalies 0(M 3 ) 
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SignatureMatchProb 
PathlnfoAssuranceLevel 



0(SN 2 \ogN) 
0(c) 



The following assumptions are considered when performing 
complexity analysis: 



1 . The number of signatures S can grow very large. 

2. DetectAnomalies and its predecessors must always be 
run in order to detect zero-day attack behaviors within 
an event window. 

3. The accuracy of DetectAnomalies is assumed high 
enough that SignatureMatchProb is executed only 
when anomalies are detected. 

Taking into consideration that M and c are small while N is 
very large, the computational complexity lies primarily in Get- 
SensingMatrix, CSSample , and SignatureMatchProb. As it is 
assumed that DetectAnomalies and its predecessors must pro- 
cess each event window, the principal savings come when no 
anomalies are detected and it is subsequently unnecessary to 
call SignatureMatchProb. 
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Compressed event windows were assembled via compressed 
sensing of individual anomalous event windows. 

We analyzed the characteristics of the normal baseline traf- 
fic data B and the traffic data with injected malicious traffic <f>. 
This analysis was conducted prior to CS sampling and subse- 
quent path information assurance level determination. Exam- 
ples of 30-minute event windows for B and <f> are shown in 
Figures 6 and 7. For the event window B with normal traffic 
(i.e., baseline), we plotted in Figure 6 the percentage of the 
frequencies of principal features. 




IV. Results 

In this section, Section IV. A presents the strategies to collect 
traffic data and analyzes the characteristics of the network traf- 
fic. Section IV. B discusses the effect of applying the CS tech- 
nique. Section IV. C then shows the accuracy of our presented 
approach. Section IV. D explains the gains in running time of 
our approach. 

A. Characterization of Sample Data 

The goal of this research was to accurately model threats 
encountered by modern cloud service providers and clients. 
The most often used data sets, DARP A/Lincoln Labs packet 
traces [26], [27] and the KDD Cup data set derived from them 
[28], are found to be inadequate for such modeling as they are 
both over a decade old and do not contain modern threats. 
These data sets containing synthetic data that do not reflect 
contemporary attacks, and have been studied so extensively 
that members of the intrusion detection community find them 
to be insignificant [29]. For these reasons, the data sets used in 
this research consists of contemporary cloud service provider 
attacks generated on large scale test networks. 

In order to establish the ground truth to evaluate the accura- 
cy of anomaly detection, we conducted an analysis of the traf- 
fic in a baseline event window B, which is free of attacks, and 
the same window's traffic <f>, which is injected with attack data. 
In particular, the baseline traffic event windows B without 
anomalies were fully sampled and descriptive statistics (e.g., 
mean, standard deviation, correlations) were calculated. Then, 
router and host node attacks were singly launched on the net- 
work where they were fully sampled. Descriptive statistics and 
signatures for each attack were calculated. This information 
established the ground truth for later analysis. The router and 
host attacks were injected into the baseline data in a random 
Poisson distribution to form anomalous event windows <f>. 



Figure 6: Baseline Data Free of Attacks (B), 
vertical axis percentage, horizontal samples. 

These features are ICMP redirect (feature/)), HTTP reset 
(feature f 2 ), and synchronization (feature/;) packets. 



Table 2 Principal Features for Baseline Network Traffic 
Feature Indicator 
7) ICMP Redirect 

f 2 TCP http [RST] 

fi TCP http [SYN, ACK] 

The principal features for the baseline data (in normal traf- 
fic) all showed a large measure of variance over the sampling 
period. The large variance measurements indicate that these 
features can be adequately sensed during the CS sampling pro- 
cess. For the traffic data <t> in an event window with anomalous 
behavior, we plotted the percentage of major features in Figure 
7. 




Figure 7: Baseline Data Plus Injected Attacks (<?>), 
vertical axis percentage, horizontal samples. 
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Figure 7 shows that the number of significant features in <t> 
increased, due to the addition of abnormally high Border 
Gateway Protocol (BGP), HTTPS, OSPF, and SSH protocol 
traffic resulting from router and host attack network traffic. 
Figure 7 also shows that the variance for each feature is also 
relatively high which indicate that CS sampling can effectively 
capture data patterns. 

Table 3 Principle Features for Network Traffic 
with Attacks 
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residual (3i) and principle (k{) subspaces equal to I ^ - 3; I and 
are bound by: 



Feature 


Indicator 


h 


ICMP Redirect 


h 


TCP http [RST] 


h 


TCP http [SYN, ACK] 


St 


TCP bgp [RST, ACK] 


f? 


TCP ospf-lite [RST, ACK] 


f 9 


TCP https [RST, ACK] 


fn 


TCP ssh [RST, ACK] 


fa 


TCP telnet [RST, ACK] 



Attack signature data was characterized prior to pro- 
cessing. A router attack signature is shown in Figure 8, which 
indicates that features /2,/fi,/z, and f 9 are significant features. 
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Figure 8: Router Attack Signature, 
vertical axis percentage, horizontal samples. 

B. Effect of Compressed Sensing (CS) 



< 4 




conf 



M 



(16) 



Where X ± is the largest eigenvalue found in the residual sub- 
space, i = 1,.., n v . 

The false alarm rate AF is bound by: 



AF < 



+ 



2 In- 



(17) 



Traditionally, the confidence threshold 90% is used which 
makes the 12 ln—^— term small when compared to |— , Thus, 

■\l conf -\l M 

a smaller M increases the probability of a false alarm and also 
increases the compression error rate [10]. The obvious ad- 
vantage in using a smaller M was a lower computational over- 
head. Accordingly, M was chosen such that the intrinsic 
sparseness of <P represented by £ * yfN in Eq. (6) yields the 
lowest compression error with £ * y/~N « M « N. 

The optimal derivation of the constant £ in Eq. (6) was 
achieved by identifying those feature components of x that had 
the highest variance and magnitudes. The value of £ was found 
to be directly proportional to the number of active features. 
The compression mean squared error (MSE) was verified by 
measuring the error contained in convex optimized reconstruc- 
tion. 




Using FFT generated frequency components of <t> to gen- 
erate the sampling matrix U v , the restricted isometric constant 
(RIC) 8 k with 9 representing non-sparse components, was kept 
very small. Keeping S k small guaranteed good linear transfor- 
mation of and high orthogonality of columns in U v . 



(1 - S k )\\0\\ 



\\U v Q\\l 



< (1 - S k )\\0\\ 



(15) 



The dimension M was optimally determined by using Eq. 
(6), and the error rate was measured. Because U v is highly 
orthogonal, the geometry preserving properties allow for the 
detection of volumetric anomalies in the residual subspace. 
The largest eigenvalues constituting 90% of the spectral pow- 
er. With a probability of 1- conf where conf (e [0,1]) is the 
confidence interval, the changes in the eigenvalues between 



Figure 9: CS Reconstruction, Baseline + Injected Attacks 
(G <— Y), vertical axis frequency, horizontal samples. 
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(a) M/N vs. % MSE 



(b) M/N vs. Exe Time (Sec) 



Figure 10: Compression ratio M/N versus MSE where is the 
number of samples (without CS) and M is the number of sam- 
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pies (with CS) (a) Recovered data fidelity and (b) Execution 
Time. 

Figure 11 shows the llzll| values calculated from the base- 
line and anomalous event windows. 



m 








0.04 






OB 

003 






IJK 






0.0! 






Dili 






0.01 






I I & ,S 9 M 11 B M 




(a) Baseline 



(b) Anomalous 



Figure 11: Detected Anomalies: Q B - horizontal line Q-Statistics 
threshold; horizontal axis represents network traffic features; vertical 

axis IIZII2. 

The llzll| values for the baseline network traffic features 
are all less than the Q B , Q-Statistics threshold indicating an 
absence of network attack traffic (a). Figure 11(b) shows \\z\\\ 
values which is greater than Q B indicating an increase in the 
number and magnitude of network traffic attack features and a 
high possibility that network attack traffic is present in the 
event window. 

Table 4 shows the relationships exhibited between opti- 
mally derived values for M, associated error, false positive 
alarm rate, and execution time for a 24 hour testing period 
with attack suites 1-3 randomly injected in a Poisson distribut- 
ed. Each of the 3 attack suites discussed in Table 1 was ana- 
lyzed resulting in an average effective anomaly detection per- 
formance greater than 93%. 

Table 4 Anomaly Detection Accuracy 



Attack 


No. of 


Detected 


False 


False 


Suite 


instances 




Pos. 


Neg. 


1 


5269 


4894 


12 


10 


2 


8020 


7575 


24 


27 


3 


2920 


2802 


12 


5 



C. Overall Accuracy 

The following table summarizes the observed accuracy of 
Algorithms 1, 2, and 3 in correctly detecting event window 
anomalies and in performing subsequent classifications. 

Table 5 Cluster Signature Probability Accuracy 

Attack No. of No. of Classified Avg. Threat 



Suite 


Attack 


Forwarded 


Instances 


Measure 




instances 


Attack 


with 


% Accura- 






instances 


>90% 
Confidence 


cy 


1 


5269 


4894 


4635 


87.96% 


2 


8020 


7575 


7194 


89.70% 


3 


2920 


2802 


2703 


93.14% 



Out of a total of 16,209 attack instances, residual sub- 
space anomaly detection correctly sensed 94.21% and for- 
warded those attack instances for subsequent classification. 
From the set of forwarded attack instances, Algorithm 1 cor- 
rectly classified 95.16% with an average confidence greater 
than 90%. In total, the DetectAnomalies-SignatureMatchProb 
chain identified an average of 90.27% of all attack instances 
injected along network paths with an average probability of 
correct match greater than 90%. 

D. Efficiency 

Figure 12 illustrates the efficiency gains derived when 
SignatureMatchProb is not called. The vertical axis represents 
the execution time in seconds while the horizontal axis repre- 
sents the ID of the data sets presented to the system. To illus- 
trate the efficiency gains, each data set was presented to the 
chain initially without removing non-anomalous event win- 
dows. Then the same data set was presented to the chain, but 
allowing non-anomalous windows to be dropped prior to clas- 
sification. In the case of first data set, two out of the four event 
windows were dropped, which leads to the corresponding effi- 
ciency gain of 50% in the signature classification phase. Simi- 
lar efficiency gains were recorded for all data sets. 



50 



& 2lM 2 !? -SCIassificatior 



K 2/4 ft 3/4 f / 2, 2/4% '7 

it it frit ft ■ 



1122334455 



Anom 
Detection 



Figure 12: CS Anomaly detection and Signature Classification: 
horizontal axis represents tested data sets; vertical axis is run 
time in seconds. 

Summarizing, an acceptably high percentage of attack in- 
stances were detected by anomaly detection (94.21%) of 
which 95.16% of these forwarded instances were associated 
with attack suite signatures with high confidence. Overall effi- 
ciency gains of over 50% were observed when non-anomalous 
event windows are dropped prior to classification. Additional- 
ly, using this unique combination of methods, the information 
assurance factor /, on any path P, was derived with greater 
than 90% confidence. 
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V. Conclusion 

In this paper, we studied the problem of determining the in- 
formation assurance level for different paths in multipath net- 
works. We showed it was possible to intelligently sense and 
quantify threats along individual paths with a high degree of 
confidence. In the process, we devised a novel approach that 
combines optimal network data sampling (CS) with residual 
subspace PCA anomaly detection and probabilistic signature- 
based intrusion detection. This methodology was shown to 
simultaneously compress sampled network traffic and reduce 
data dimensionality by filtering out non-contributing network 
traffic features. On the compressed and dimension-reduced data 
set, our approach efficiently performs path threat detection and 
classification. This approach increases the efficiency and data 
handling capabilities of both the anomaly and signature-based 
detection algorithms. 

We also derived a theoretical multipath SQoS relation, Eq. 
(1). This relationship allows for the dynamic adjustment of 
security measures along each path and maintains the overall 
throughput at the same time. The determination of the security 
measures using our newly developed approach solves the most 
technically challenging portion of the multipath SQoS relation 
Eq. (1). Our approach and the multipath SQoS relations lay a 
solid foundation for the future expansion of adaptive multipath 
security approaches. 
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Abstract — Steganography is the art and science of hiding data or 
the practice of concealing a message, image, or file within another 
message, image, or file. Steganography is often combined with 
cryptography so that even if the message is discovered it cannot 
be read. It is mainly used to maintain private data and/or secure 
confidential data from misused through unauthorized person. In 
contemporary terms, Steganography has evolved into a digital 
strategy of hiding a file in some form of multimedia, such as an 
image, an audio file or even a video file. This paper presents a 
simple Steganography method for encoding extra information in 
an image by making small modifications to its pixels. The 
proposed method focuses on one particular popular technique, 
Least Significant Bit (LSB) Embedding. The paper uses the 
(LSB) to embed a message into an image with 24-bit (i.e. 3 bytes) 
color pixels. The paper uses the (LSB) of every pixel's bytes. The 
paper show that using three bits from every pixel is robust and 
the amount of change in the image will be minimal and 
indiscernible to the human eye. For more protection to the 
message bits a Stego-Key has been used to permute the message 
bits before embedding it. A software tool that employ 
steganography to hide data inside of other files (encoding) as well 
as software to detect such hidden files (decoding) has been 
developed and presented. 

Key Words — Steganography, Hidden-Data, Embedding-Stego- 
Medium, Cover-Medium, Data, Stego-Key, Stego-Image, Least 
Significant Bit (LSB), 24-bit color pixel, Histogram Error (HE), 
Peak Signal Noise Ratio (PSNR), Mean Square Error (MSE). 

I. Introduction 

One of the most important properties of digital 
information is its easiness in producing and distributing 
unlimited number of its copies (i.e. copies of text, audio and 
video data) regardless of the protection of the intellectual 
and production rights. That requires innovative ways of 
embedding copyright information and serial numbers in those 
copies. 

Nowadays, the need for private and personal computer 
communication for sharing confidential information 
between two parties has increased. 

One such technique to solve the above mentioned 
problems is Steganography [11] [3]. It is the art of hiding 
private information in public information used or sent on 
public domain or communication from an unwanted party. 



These private information need to be undetectable and/or 
irremovable, especially for the audio and video data cases. 

The art of hiding messages is an ancient one. 
Steganography (literally meaning covered writing) is a form of 
security through obscurity. For example, a message might 
be hidden within an image. One method to achieve that is by 
changing the least significant bits to be the message bits. The 
term steganography was introduced at the 15th century. 
Historically, steganography was used for long time 

ago. Messages were hidden (i.e. tattooed) on the scalp of 
slaves. One famous example being Herodotus who in his 
histories tells how Histiaeus shaved the head of his most 
trusted slave and tattooed it with a message which disappeared 
once the hair grew back again. Invisible ink has been for quite 
some time. Microdots and microfilm technology used after the 
advance of the photography science and technology. 

Steganography hides the private message but not the fact 
that two parties are communicating. The process involves 
placing a hidden message in a transport medium (i.e. the 
carrier). The secret message is embedded in the carrier to form 
the steganography medium. Steganography is generally 
implemented by replacing bits of data, in regular computer 
files, with bits of different, invisible information. Those 
computer files could be graphics, sound, text or HTML. The 
hidden information can be plain text, cipher text, or images. 

In paper [2], the authors suggested an embedding 
algorithm, using two least significant bits that minimize the 
difference between the old value of the pixel in the cover and 
the new value of the pixel in the stego-image in order to 
minimize the distortion made to the cover file. Experimental 
results of the modified method show that PSNR is greater than 
the conventional method of LSBs replacement. 

A distinguish between stegnography and cryptography 
should be emphasized. Steganography is the science and art 
of hiding information from a third party. 
Cryptography is the science and art of making data 
unreadable by a third party. Cryptography got more attention 
from both academia and industry than steganography. 
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Nowadays, steganography is becoming increasingly important 
for both military and commercial communities [9]. 

n. Steganalysis 

Steganalysis is the science and art of detecting and 
breaking steganography. Examining the color palette is one 
method of the steganalysis to discover the presence of hidden 
message in an image. Generally, there will be a unique binary 
encoding of each individual color. If the image contains 
hidden data, however, many colors in the palette will have 
duplicate binary encodings. If the analysis of the color 
palette of a given image yields many duplicates, we might 
conclude with high confidence of the presence of hidden 
information. 

Steganalysts have a tough job to do, because of the vast 
amount of public files with different varieties (i.e. audio, 
photo, video and text) they have to cover. Different varieties 
require different techniques to be considered. 

Steganalysis and cryptanalysis techniques can be 
classified in a much similar way, depending upon the known 
prior information: 

• Steganography-only attack: Steganography medium is 
available and nothing else. 

• Known-carrier attack: Carrier and steganography media 
are both available. 

• Known-message attack: Hidden message is known. 

• Chosen-steganography attack: Steganography medium 
as well as used steganography algorithm are available. 

• Chosen-message attack: A known message and 
steganography algorithm are used to create 
steganography media for future analysis. 

• Known-steganography attack: Carrier and 
steganography medium, as well as the 
steganography algorithm, are available. 

In [1] the author urges the steganalysis investigation of the 
three least significant bits. 

Until recently, information hiding techniques received very 
much less attention from the research community and from 
industry than cryptography, but this has changed rapidly. The 
search of a safe and secret manner of communication is very 
important nowadays, not only for military purposes, but also 
for commercial goal related to the market strategy as well 
as the copyright rights. 

Steganography hides the covert message but not the fact 
that two parties are communicating with each other. The 
steganography process generally involves placing a hidden 
message in some transport medium, called the carrier. The 
secret message is embedded in the carrier to form the 
steganography medium. The use of a steganography key may 
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be employed for encryption of the hidden message and/or for 

randomization in the steganography scheme. 

III. HOW DOES IT WORK? 

Without any loss of generality, the paper will use the 
following equation to support us with a general undurstanding 
of the steganographic process: 

cover_medium + hidden_data + stego_key = stego_medium. 

The cover_medium is the file to be used to hide the 
hidden_data. A stego_key could be used if an encryption 
scheme (i.e. private/public key cryptography) will be mixed 
with the steganography process. The resultant file is the 
stego_medium, which will be the same type of file as the 
cover_medium. In this paper, we will refer to the 
cover_image and stego_image, because the focus is on the 
image files. 

Classification of stenography techniques based on the cover 
modifications applied in the embedding process is as follows: 

A. Least significant bit (LSB) method 

This approach [19] [6] [5] [4] [14] [12] is very simple. In this 
method the least significant bits of some or all of the bytes 
inside an image is replaced with a bits of the secret 
message. The least significant bit (LSB) substitution and 
masking & filtering techniques are well known 
techniques to data hiding in images. LSB is a simple 
approach for embedding information in an image. 
Replacement of LSBs in digital images is an extremely simple 
form of information hiding. 

B. Transform domain techniques 

This approach [7] [10] embeds secret information in the 
frequency domain of the signal. Transform domain methods 
hide messages in significant areas of the cover image which 
make them more robust to attacks such as: compression, 
cropping, and some image processing, compared to LSB 
approach. 

C. Statistical methods 

This approach [8] encodes information by changing 
several statistical properties of a cover and uses a 
hypothesis testing in the extraction process. The above process 
is achieved by modifying the cover in such a way that some 
statistical characteristics change significantly i.e. if "1" is 
transmitted then cover is changed otherwise it is left as such. 

D. Distortion techniques 

In this technique [13] [18] [17] [16] the knowledge of 
original cover in the decoding process is essential at the 
receiver side. Receiver measures the differences with the 
original cover in order to reconstruct the sequence of 
modification applied by sender. 
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The simplest approach to hiding data within an image file is 
the least significant bit method (LSB). If a 24-bit color is used, 
then the amount of change will be minimal and indiscernible 
to the human eye. 

In [15], authors mixed between strong cryptography 
schemes and steganography, the time complexity of the 
overall process increases but at the same time the security 
achieved at this cost is well worth it. The cryptography 
algorithm was used is the RSA public key cryptography 
algorithm. The complexity of pure steganography combined 
with RSA algorithm (three bits) increases by 15 to 40% in 
comparison to two bit pure steganography combined with 
RSA. The complexity of Pure Steganography and 
steganography combined with Diffie Hellman algorithm is 
nearly the same. 

In this paper the presented steganography method is based 
on the spatial domain for encoding private information in an 
image by making small modifications to its pixels. The 
proposed method focuses on one particular popular technique, 
Least Significant Bit Embedding. The paper emphasizes on 
hiding information in online image. Example of a software 
tool that uses steganography to hide private data inside of 
public image file as well as to detect such hidden private data 
will be presented. In this paper the cryptography used was 
simple symmetric encryption and decryption. One of the main 
goals is to show the robustness of using three bits least 
significant bits per pixel. 

IV. Least significant bit (LSB) insertion 

Suppose we have an 8-bit binary number 11111111. 
Changing the bit with the least value (i.e. the rightmost bit) 
will have the least effect on that binary number. That is why 
the rightmost bit name is the Least Significant Bit (LSB). The 
LSB of every byte can be replaced. The effect on overall file 
will be minimal. 

The binary data of the private information is broken up into 
bits and inserted into the LSB of each pixel in the image file. 

One way to implement that insertion is by special 
rearrangement of the color bytes. Suppose we have an 8-bit 
color image. A stego software tool can make a copy of an 
image palette. The copy is rearranged so that colors near each 
other are also near each other in the palette. The LSB of each 
pixel (i.e. 8-bit binary number) is replaced with one bit from 
the hidden message. A new color in the copied palette is 
found. The pixel is changed to the 8-bit binary number of the 
new color. 

The number of bits per pixel will determine the number of 
distinct colors that can be represented. A 1 bit per pixel image 
uses 1-bit for each pixel, so each pixel can be either 1 or 0. 
Therefore we will have: 1 bit per pixel=2' = 2 colors, 2 bit per 
pixel=2 2 = 4 colors, 3 bit per pixel=2 3 = 8 colors, 24 bit per 
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pixel=2 ~ 16.8 million colors. In this paper we will assume 

that the picture has 24 bit per pixel. 



As an example, suppose that we have three adjacent 
pixels (nine bytes) with the following encoding (see figure 1): 



Pixel 1= 


10010101 


00001101 


11001001 


Pixel 2= 


10010110 


00001111 


11001010 


Pixel 3= 


10011111 


00010000 


11001011 



Fig. 1. 



For example, in order to hide the following 8 bits of data 
that represents character "H": 01001000, we overlay these 8 
bits over the LSB of the 9 bytes of figure 1 as a 
consequence we get the following representation (see figure 
2): 



Pixel 1= 


10010100 


00001101 


11001000 


Pixel 2= 


10010110 


00001111 


11001010 


Pixel 3= 


10011110 


00010000 


11001011 



Fig. 2. The bits in bold have been changed 



Note that we have successfully hid 8 bits at a cost of only 
changing 3 bits, or roughly 33%, of the LSBs. In this paper, 
we are using 24-bit color. Therefore, the amount of change 
will be minimal and unnoticeable to the human eye. We will 
leave it as a further work to answer the question of what are 
the maximum number of bits per pixel that could be used to 
embed messages before noticing the difference? (see table 1). 



TABLE I. 



Message 


I s ' 


Effects 


2 nd 


Effects 


3 rd 


Effects on 


Bit 


LSB 


on 


LSB 


on 


LSB 


pixel 






pixel 




pixel 












None 





None 





None 


1 


1 


None 


1 


None 


1 


None 





1 


-1 


1 


-256 


1 


-65536 


1 





+1 





+256 





+65536 



The mentioned LSB description is meant as an example. 
At the case of gray-scale images, LSB insertion works well. 
The gray-scale images has the benefit of hiding data in the 
least and second least significant bits with minimal effect on 
the image. 

Some techniques of image manipulation could make the 
LSB insertion vulnerable. Converting a lossless 
compression image (i.e. GIF or BMP) to a lossy 
compression image (i.e. JPEG) and then converting them back 
can destroy the data in the LSBs. 
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V. Encoding and Decoding steps in (lsb) 

Section (5.1) show the steps needed to get and set LSB bits 
of very byte. Section (5.2) show the steps required to create 
the stego file. Figure 4 represents the flow chart of the 
encoding algorithm used in this paper. The decoding 
algorithm works in the opposite way round and the flow chart 
for the decoding algorithm is given in figure 5. 

A. Get and set bits at LSB algorithm (see figure 3) 

For each byte of the message, we have to: 

1 ) Grab a pixel. 

2) Get the first bit of the message byte. 

3) Get one color component of the pixel. 

4) Get the first bit from the color component. 

5) If the color -bit is different from the message -bit, 
set/reset it. 

6) Do the same for the other seven bits. 

B. Create stego file 

1) Open the cover file into stream. 

2) Check if the cover file is bitmap file. 

3) Check if the cover file bitmap is 24 bits. 

4) Write the header of cover file to stego file (new 
stream) 

5) Add the length of message at the first (4) bytes of 
stego file (new stream) 

6) Encrypt the message using simple symmetric 
encryption key. 

7) Hide the message by using LSB algorithm (i.e. get 
and set). 
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Kg. 3. 




Extract bit fen secret 
message character 



i=0fO.b*Read.b\ie\\'iite,bit 






b\ieRead=getfetbyte() 






byteWnte 


=ln1eRead 




Replace the bit of secret 
message character rath 
LSB byte of cover file 



tit=Extrac[(ffie»p[I], 



I 



byteWrite^ReplacefO.bit) 




H 
H-l 






Write new b)T 


into Steffi file 




Fig. 4. Encoding Algorithm 
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T 



Algorithm 

w / Input lenath of 

Of Decoding / '"7-" 



=Oj=0.hteRead.kideiie[leiisti]liit 



I 



bvteReaci=jeL\"extBvtei ) 



store the tit of 
secret mease 
character in 
hidden array 



Yes 



^ Yes 
IfKmesHgeJenali — . 




* Bit=Eitacl((livte)bMead.Ot 



T 



Replace :'bidden[i]. j. bit) 

m 



J 



j=0.i=iH 



IfFksth 

Jt« 

IfreadBjte^l 



Return hidden 



T 



^ CJ ^ 

v End ;. 



C. Example (Please revise the previous section of least 
significant bit (LSB) insertion) 



Plain Message character:H=72 
Key=55 

Encrypted Message character =127 


01001000 
00110111 
01111111 


XOR 


Encrypted Message character=127 
Get the first bit of encrypted message 


01111111 
00000001 
00000001 


AND 


Get the first byte in the cover image of 
Pixell 


11001001 




Set the first bit of the encrypted 
message in the (LSB) of the first byte 
of the cover image of Pixel 1. 

Resulted first byte of Pixell 


11001001 
11111110 
11001000 
00000001 
11001001 


AND 
OR 


Encrypted Message character=127 

Get the second bit of encrypted 
message 


01111111 
00000010 
00000010 


AND 


Shift right once to put the second bit 
as (LSB) 


00000001 




Get the second byte in the cover 
image of Pixell 


00001101 




Set the shifted second bit as (LSB) of 
the encrypted message in the (LSB) of 
the second byte of the cover image of 
Pixell. 


00001101 
11111110 
00001100 
00000001 


AND 
OR 


Resulted second byte of Pixell 


00001101 




Encrypted Message character=127 


01111111 
00000010 


AND 


Get the third bit of encrypted 
message 


00000100 




oniit rignt twice? to put trie? tnircl bit as 
(LSB) 


UUUUUUU1 




Get the third byte in the cover image 
of Pixell 


10010100 




Set the shifted third bit as (LSB) of the 
encrypted message in the (LSB) of the 
third byte of the cover image of 
Pixell. 


10010100 
11111110 
10010100 
00000001 


AND 
OR 


Resulted third byte of Pixell 


10010101 





Original Pixel 1= 10010100 00001101 11001001 
Resulted Pixel 1= 10010101 00001101 11001001 

Continue as above for the rest of the bits of the 
encrypted message characters. 



Fig. 5. Decoding Algorithm 
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The C#-functions for getting and setting single bit are simple: 

private static bool GetBit(byte b, byte position) 

{return ((b & (byte)(l « position)) != 0);} 

private static byte SetBit(byte b, byte position, 
bool newBitValue) 

{byte mask = (byte)(l « position); 

if(newBitValue) { 

return (byte)(b I mask);} 

else 

{return (byte)(b & -mask);} 

} 

A proposed Pseudo-code for hiding messages: 
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for all bytes in the message stream 




read a byte from the key stream 




read a byte from the message stream 




XOR these bytes, store result in cunentB/te 




for(int index=0; index<8; index++) 






calculate the position of the next pixel 






get the colour of the pixel 






get the value of the bit at position indextiom cmentB/te 






set the lowest bit of the R , G or B value to the same value 



A proposed Pseudo-code for extracting hidden messages: 



for all expected bytes of the message 




read a byte from the key stream 




initialize a variable (currentB/te) as a buffer for extracted bits 




for(int index=0; index<8; index++) 






calculate the position of the next pixel 






get the colour of the pixel 






get the value of the bit in position /'ncfexfrom R, G or B 






set the bit at position mdex'm currentByte to the same value 




write current B/te into the message stream 



VI. Experiments 

This Section explains the steganography application in order 
to encode a text message into image file and decode that 
message from the stegano file. 

The following figure shows the Main Menu screen with three 
buttons: the two buttons in the upper side of the screen used for 
encrypting and decrypting a text message into image file, the 
third button in the middle of lower side of the screen is used for 
encrypting and decrypting images. 



By pressing the Encrypt Text File text box button or Dealing 
With Images text box button will lead you to the first screen 
of the encoding process, after you finish the encoding process 
click the button in the upper right side of the screen Decrypt 
The Stegano File to continue the decoding process. 



Encrypt Text File 




Welcome To Stego Tool 




Fig. 6. Main menu screen 



A. Encoding: 
1) Stepl 

In the first step, insert the path of the required image to be 
encoded in the "Source Image File" text box, or click the 
Browse button to select it. The selected image could be seen 
in the "Source Image Preview" picture box (see figure 7). 

The application shows the image size in bytes in the 
"Image Size " text box, and it also shows how many bytes you 
can hide inside this image. The maximum number of bytes 
you could hide will be displayed in the text box "You can hide 
up to" (see figure 7). Click button Next to proceed to the next 
step. 



teganography : Encode Steps 



Slepl j Step 2 \ Step 3 j Slep4' 



Choose Source Image File 



Source Image File 




Fig. 7. Encoding screen (step 1) 
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2) Step 2 

In the next step two options are available (see fig. 8): 
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a) Either write the required text message to be hide in 
the image in the text box shown on the screen. 

b) Or select the file that contains the text message to be 
hide in the image by clicking the button Browse 
and insert the path in the text box "File Name". 

The number of bytes to be encoded in the image will be 
displayed in the text box 'Wo. of Bytes". Click button "Next" 
to proceed to the next step. 



B Steganography : Encode Steps 



Step 1 S'ep 2 | Step 3 | Step 4 




Fig. 8. Encoding screen (step 2) 

3) Step 3 

In the third step (see fig. 9) type the output image name 
that contains the encoded message in the text box "Stego File 
Name", and a security password in the text box "Password". 

Finally, click button "Finish" to create the target file and 
go to the next step. 



Stepl | Step2 Step 3 | step 4 | 




Here below is the encoded message (text) into the source 
image 




4) Step 4 

At this screen (see fig. 10), a comparison between the 
original image before encoding (Cover Image) and the output 
image after encoding (Stego Image) could be seen by the 
naked eye. 

Click button "Close" when finishing comparison. 



E 



Stepl | Step 2 1 Step 3 Step 4 j 



JSjx| 



Compare Images Before and After Encoding 




Fig. 10. Encoding screen (step 4) 

If the button "Decrypt The Stegano File" at the Main 
Menu screen (Figure 6) had been pressed, then the next screen 
will leads the user through two steps to complete the decoding 
stage. These two steps are explained below: 

B. Decoding: 
1) Stepl 

At this stage, the encoded message with the given stegofile 
name is stored in main directory with the current path. 
Now go to the main menu (see fig. 6) and click, this time, the 
button "Decrypt". Click button "Browse" to select the new 
created image (encoded image) and the application will 
show the encoded image size in bytes in the text box "Stego 
Image Size". Also the encoded image will be shown in the 



Fig. 9. Encoding screen (step 3) 
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picture box "Stego Image Preview". Type the same password 
that entered while encoding that message (see Figure 9). You 
will be popped by the screen in figure 1 1 . 



n Steganogr-aphy : Decode Steps 




Stego Image size : I ■ Bytes 



stego Image Preview 



Fig. 11. Decoding screen (step 1) 

2) Step 2 

In this step, click the button "Decode" to decode the 
message. The encoded message will be extracted and will be 
shown in text box "The Extracted Message" (see fig. 12). 

Either save the message to a file by pressing the button 
"Save To File" or clear the message shown by pressing the 
button "Clear". Click button "Exit" to exit the 
application. 



Stepl Step 2 j 



The Extracted Message- 



Mars is the fourth planet from the Sun and is commonly referred to as the Red 
Planet. The rocks, soil and sky have a red or pink hue. The distinct red color was 
observed by stargazers throughout history It was given its name by the Romans 
in honor of their god of war. Other civilizations have had similar names. The 
ancient Egyptians named the planet Her Descher meaning the red one. 




D( 


code 







Fig. 12. Decoding screen (step 2) 



C. Results of Experiments 

Many changes could happen to an image due to applying 
stenography techniques. Some of the finer details in the image 
can be sacrificed due to embedding of a message. That 
corruption to the original image is acceptable as long as the 
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error between the original and the stenography image is 

tolerable. Three error metrics have been used in this paper to 

compare the various image differences between original image 

and stenography image techniques and to measure the degree 

of corruption. These three error metrics are: 



1) The Mean Square Error (MSE) is the mean of the cumulative 
squared error between the stenography and the original image. 
Given a noise-free mxn monochrome image I (i.e. original 
image) and its noisy approximation K (i.e. stenography 
image), MSE is defined as: 

m-1 n—1 

mse = — y y - K(ij)] z 

ma J— i £—t 

:=0 j=Q 

A lower value for MSE means lesser error. So, it is a 
target to find an image stenography scheme having a 
lower MSE. That will be recognized as a better 
stenography. 

2) The Peak Signal to Noise Ratio (PSNR) is a measure of the 
peak error. (PSNR) is usually expressed in terms of the 
logarithmic decibel scale. (PSNR) is most commonly used to 
measure the quality of stenography image. The signal in this 
case is the original data, and the noise is the error introduced 
by stenography. PSNR is an approximation to human 
perception of stenography quality. Here, MAX { is the 
maximum possible pixel value of the image. When the pixels 
are represented using 24 bits per sample, then MAXi 
=16777215 (2 24 ). 

rara=ioio gtt (^) 

= 20 log 1(t (MAS'/) - 10 log 10 (M5£") 



From the above equations, there are an inverse 
relation between the (MSE) and (PSNR), this 
translates to a high value of (PSNR). The higher the value 
of (PSNR), the better is the stenography. 

3) The Histogram Error (HE) is an image histogram (HE) is a 
chart that shows the distribution of intensities in an indexed 
or grayscale image. The images used in this paper are 
colored. In order to work on all the color channels, the 
colored images will be stretched into vectors before doing 
image histogram function. The image histogram function 
creates a histogram plot by making equally spaced bins, each 
representing a range of data values (i.e. grayscale). It then 
calculates the number of pixels within each range. 

HE shows the distribution of data values. We intend to 
find the similarity of two images by measuring the 
histogram error (HE) between them. The smaller the (HE), the 
closer the similarity. It is calculated by measuring how far are 
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the differences between two normalized histograms that 
belong to two different images, from each other. That could 
happen by subtracting the two normalized histograms vectors 
from each other and then squaring the resulted vector. There 
exist an inverse relationship between the value of (HE) and 
how close the two normalized histograms are to each others. 
It implies that the smaller the (HE) the closer to each other are 
the images. Let the two histogram images Iml (i.e. original 
image) and Im2 (stenography image) be denoted by Iml and 
Im2, respectively, and assuming the two images having the 
same mxn size. Calculate the Normalized Histograms hnl and 
hn2 of Image 1 and Image 2, then finally calculate (HE) as the 
following: 
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The experiment will be done by comparing (the original 
image vs. the sfegoimage) against (the original image vs. 
corrupted original image) in order to discover how far is the 
stegoimage from the original image. 

The corrupted original image will be calculated by 
adjusting the matrix entries of the original image (X) by a factor 
of (0.40, 0.50, 0.90 & 0.9977) . The results corrupted image will 
be (X*0.10, X*0.50, X*0.90 & X*0.9977). See figure 14 to 
figure 19 for each image and its associated histogram, and see 
also table 2. 



same! (iml) 



same! (Im2) 



HE = ^\hnl- h.n2] : 



The following figure 13 and figure 13a are an example of 
an image and it's Histogram: 





Fig. 14. Original Image 




Fig. 14a. Histogram of the original image 



Fig. 13. Example of an image 





Fig. 15. Corrupted image X*0.40 



Fig. 13a. Example of an image' histogram 
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TABLE II THE PEAK SIGNAL TO NOISE RATIO (PSNR), 

HISTOGRAM ERROR (HE) VALUES AND MEAN SQUARE ERROR 
(MSE) VALUES 





X vs. X*0.40 


X vs. X*0.50 


X vs. Y 


PSNR 


120.6227 


120.6970 


160.8882 


HE 


7.1e-03 


3.9e-03 


0.0016387e-03 


MSE 


243.8776 


239.7416 


0.0229 





X vs. X*.90 


X vs. X*.9977 


X vs. Y 


PSNR 


123.7007 


158.1746 


160.8882 


HE 


0.85064e-03 


0.023843e-03 


0.0016387e-03 


MSE 


120.0511 


0.0429 


0.0229 



Experimental results show that the Peak Signal to Noise Ratio 
(PSNR) is substantially greater for a fair amount of input see 
figure 8 and figure 12. 

VII. Conclusion and Further work 

This paper presents a Steganography method based on the Least 
Significant Bit Embedding. The paper emphasizes on hiding 
private information in public image. Examples of software tool 
that employ steganography to hide private data inside of image 
file as well as software to detect such hidden data were 
presented. The paper used simple symmetric encryption and 
decryption. The paper shows the robustness of using three bits 
least significant bits per pixel. 

As mentioned before, it remains as a further work to know what 
are the maximum number of bits per pixel that could be used to 
embed messages before noticing the difference? In other words, is 
there a mathematical relationship between the numbers of bits per 
pixel that make up the image's raster data and the number of bits 
that could be used in each pixel of the cover image to embed 
messages before noticing the difference? In our case, we used 3 
least significant bits per pixel; each pixel has 24-bit to store the 
digital image. 
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Abstract 

The improvement of the accuracy of image query 
retrieval used image classification technique. Image 
classification is well known technique of supervised 
learning. The improved method of image classification 
increases the working efficiency of image query 
retrieval. For the improvements of classification 
technique we used RBF neural network function for 
better prediction of feature used in image 
retrieval.Colour content is represented by pixel values 
in image classification using radial base function(RBF) 
technique. This approach provides better result 
compare to SVM technique in image 
representation.Image is represented by matrix though 
RBF using pixel values of colour intensity of image. 
Firstly we using RGB colour model. In this colour 
model we use red, green and blue colour intensity values 
in matrix.SVM with partical swarm optimization for 
image classification is implemented in content of images 
which provide better Results based on the proposed 
approach are found encouraging in terms of color 
image classification accuracy. 

Keywords: RBF network, PSO technique, image 
classification. 



Introduction 

classifying Image classification is defined as the task 
of the number of images into (semantic) categories 
based on the available supervised data. The main 
objective of digital image classification procedure is 
to categorize the pixels in an image into land over 
cover classes. The output is thematic image with a 
limited number of feature classes as opposed to a 
continuous image with varying shades of gray or 
varying colors representing a continuous range of 
spectral reflectance [20]. 

RBF function is a neural network approach. It is 
based on function values which is measure by origin. 
The distance show colour intensity of image. Image 



features are colour, texture, shape and size. Large 
collections of images are becoming available to the 
public, from photocollection to web pages or even 
video databases. Sincevisual media requires large 
amounts of memory andcomputing power for 
processing and storage, there is aneed to efficiently 
index and retrieve visual informationfrom image 
database [21]. 

The idea of RBFderives from the theory of function 
approximation. We have already seen how Multi- 
Layer Perception (MLP) networkswith a hidden layer 
of sigmoid units can learn to approximate functions. 
RBF Networks take a slightly different approach [11, 
17]. Their main features are: They are two-layer feed- 
forward networks. The hidden nodes implement a set 
of radial basis functions (e.g. Gaussian functions). 
The output nodes implement linear summation 
functions as in an MLP. The network training is 
divided into two stages: first the weights from the 
input tohidden layer are determined, and then the 
weights from the hidden to output layer [16]. 
The Image classification using SVM classifier,RBF 
classifier and POS optimization technique based on 
content of image are providing comparatively result. 
Efficient indexing and Extraction of large number of 
color images, classification plays an important and 
challenging role. The main aim of this research work 
is devoted to finding suitable representation for 
images and classification generally requires 
comparison of images depending on the certain 
useful features [5]. 

Literature Survey 

(1) Xiao-Qing Shang, Guo-Xiang Song, Biao Hou in 
China in 2003. They carried out work on "content 
based texture image classification." A new method 
for content based texture image classification is 
proposed using support vector machine of the image, 
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which combines the characteristics of Brushlet and 
Wavelet transform. 

(2) P. J.Gri ffi Ths, J. S. Marsland and W.Eccleston 
in Leverpoolin 2003. They discussed work on "A 
Study of Noise in PWM Neural Networks". This 
paper shows the effect of introducing noise to the 
weight set and at the input to the neuron. The MLP 
investigated is tolerant to noise added at the input to 
the neuron and therefore could be implemented using 
the PWM neural network with the RC time constant 
set close to the PWM period. 

(3) Keiji Yanai in Japan in 2003.They work out on 
"Generic Image Classification Using Visual 
Knowledge on the Web." In this paper, They describe 
a generic image classification system with an 
automatic knowledge acquisition mechanism from 
the World Wide Web. 

(4) Luca Piras and Giorgio Jiacinto in University of 
Cagliari in Italy in 2010. They proposed work on 
"Unbalanced learning in content-based image 
classification and retrieval. "In this paper we propose 
a technique aimed at artificially increasing the 
number of examples in the training set in order to 
improve the learning capabilities, reducing the 
unbalance between the semantic class of interest, and 
all other images. The proposed approach is tailored to 
classification and relevance feedback techniques 
based on the Nearest-Neighbor paradigm. A number 
of new points in the feature space are created based 
on the available training patterns; so that they better 
represent the distribution of the semantic class of 
interest. 

(5) Saurabh Agrawal, Nishchal K Verma, Prateek 
Tamrakar, Pradip Sircar in Indian Institute 
ofTechnology Kanpur, India at 2011. They work on 
"Content Based Color Image Classification using 
SVM."They implement classification of image using 
SVM classifier in the colour content of image. They 
use optimal hyper planes technique thorough support 
vector machine. In this paper, they use color image 
classification on features extracted from histograms 
of color components. The benefit of using color 
image histograms are better efficiency, and 
insensitivity to small changes in camera view-point 
i.e. translation and rotation. 

(6) Siu-Yeung Cho inthe University of Nottingham 
Ningbo China in 2011. They research on"Content 
Based Structural Recognition for Flower Image 
Classification." In this paper, a study was made on a 
development of content based image retrieval system 
to characterize flower images efficiently. In this 



system, a method of structural pattern recognition 
based on probabilistic based recursive model is 
proposed to classify flower images. 

(7) Giuseppe Amato, Fabrizio Falchi and Claudio 
Gennaro in Pisa, Italy in 2011. They carried out work 
on"Geometric consistency checks for kNN based 
image classification relying on local features." In this 
paper In this paper we propose a technique that 
allows one to use access methods for similarity 
searching, such as those exploiting metric space 
properties, in order to performKNN classification 
with geometric consistency checks. 

(8) Wang Xing Yuan, Chen Zhi feng and Yunjiao 
Jiao in China in 2011. They carried out work on "An 
effective method for colour image retrieval based on 
texture." They proposed thatan effective colour 
image retrieval method based on texture, which uses 
the colour occurrence matrix to extract the texture 
feature and measure the similarity of two colour 
images. 

(9) Yu Zeng , Jixian Zhang , J.L. Van Genderen , 
Guangliang Wang Chinese Academy of Surveying 
and Mapping, Beijing , P.R.China in 2012. They 
research on"SVM-based Multi-textural Image 
Classification and Its Uncertainty Analysis. "This 
paper presents a supervised image classification 
method which ismultiple and multi-scale texture 
features and support vector machines (SVM). 

(10) Masato Yonekawa and Hiroaki Kurokawa in the 
School of Computer Science, Tokyo University of 
Technology, Tokyo Japan in 2012.They proposed on 
"The Content-Based Image Retrieval using the Pulse 
Coupled Neural Network." In this paper they 
proposed a learning method to define the parameters 
in the PCNN for image matching. The learning 
method improves the performance of image matching 
using thePCNN. On the other hand, recently, a lot of 
researches on the Content-Based Image Retrieval 
(CBIR) have been studied. 

(11) Methaq Gaata ,Sattar Sadkhn, Saad Hasson 
Montpellier, France in 2012.They work on"Reference 
Quality Metric Based on Neural Network for 
Subjective Image WatermarkingEvaluation."In this 
work the goal of IQA research is to design 
computational models which have ability to provide 
an automatic and efficient way to predict visual 
quality in a way that is consistent with subjective 
human evaluation. 

(12) R. Venkata Ramana Chary, D. Rajya Lakshmi 
and K.V.N. Sunitha Tiruvannamalai, TN., India In 
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December, 2012 carried out work on "Image 
Searching Based on Image Mean Distance Method." 
They discussed that when the size of database is 
increasing image similarity finding .It is big task for 
the researchers to give the efficient solutions. 
Content-Based Image Retrieval (CBIR) systems are 
used in order to retrieve image from image dataset. In 
our proposed method, we are utilizing clustering 
method for retrieve the images. 

(13) Mihir Jain,Rachid Benmokhtar,Patrick 
GrosINRIA Rennes in 2012. They carried out work 
on "Hamming Embedding Similarity-based Image 
Classification." They propose a novel image 
classification frame- work based on patch matching. 
More precisely, we adapt the Hamming Embedding 
technique, first introduced for image search to 
improve the bag-of-words representation. We 
propose a mapping method to cast the scores output 
by the Hamming technique into a proper similarity 
space. 

(14) Amel Znaidia, Aymen Shabou, Adrian Popescu, 
Herve le Borgne , Celine Hudelot in france in 
2012. They carried out work on "Multimodal Feature 
Generation Framework for Semantic Image". 
Classification unified framework which mixes textual 
and visual information in a seamless manner. 

(15) Feilong Cao,Bo liu and Dong Sun Park in china 
in 2012. They research on "Image classification 
based on effective extreme learning machine." In this 
work, a new image classification method is proposed 
based on extreme k means (EKM) and effective 
extreme learning machine. The proposed processes 
has image decomposition with curve let transform, 
reduces dimensiolity with discriminative locality 
alignment (DLA). 

(16) Yan leng, Xinyan Xu and Guanghui Qi in china 
in 2013. They carried out work on "Combining active 
learning and semi supervised learning to construct 
SVM classifier." In this work active semi supervised 
SVM algorithm perform much better than the pure 
SVM algorithm 

(17) Ming Hui Cheng,Kao Shing Hwang Jyh Horng 
Jeng and Nai Wei lin Taiwan in 2013. They work on 
"classification based video super resolution using 
artificial neural networks." In this study, they 
proposed to enhance low resolution to high resolution 
frames. The proposed method consists of four main 
steps classification motion trace volume collection 
temporal adjustment and ANN prediction classifier is 
designed based on the edge properties of a pixel in 
the frame to identify the spatial information. 



(18) Marko Tkalcic , AnteOdic, Andrej Kosir and 
Jurij Tasic member of IEEE in feb 20 13. They carried 
out work on " Affective Labeling in a Content-Based 
Recommender System for Images. "In this paper we 
present a methodology for the implicit acquisition of 
affective labels for images. It is based on an cotent 
detection technique that takes as input the video 
sequences of users facial expressions. It extracts 
Gabor low level features from the video frames and 
employs a k nearest neighbor's machine learning 
technique to generate affective labels in the valence - 
arousal-dominance space. 

(19) Serdar Arslan, Adnan Yazici, Ahmet 
Sacan,Ismail H,Toroslu Esra Acar in USA in 
20 13. They proposed work on "Comparison of 
feature based and image registration based retrieval 
of image data using multidimensional data access 
methods" They proposed that multidimensional 
scaling can achieve comparable accuracy while 
speeding up the query times significantly by allowing 
the use of spatial access methods. 

Comparison between RBF network and other 
classification technique: 

(1) The classification techniques are not providing 
better optimal result. Some techniques are traditional. 
Radial basis function network technique is artificial 
neural network technique. It is provide optical 
classification which is based on supervised learning 
and training data set. 

(2) As a SVM classifier SVM suffering two 
problems: 

(i) How to choose optimal feature sub set 

input. 

(ii) How to set best kernel parameters. 

These problems influence the performance and 
accuracy of support vector machine. Now the pre- 
sampling of feature reduced the feature selection 
process of support vector machine for image 
classification. 

(3) For the improvements of classification technique 
we used RBF neural network function for better 
prediction of feature used in image retrieval. Our 
proposed method optimized the feature selection 
process and finally sends data to multiclass classifier 
for classification of data. 
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Conclusion 

After survey of papers we find that Image is 
classified through its content like colour, texture, 
shape and size. In this paper, feature extraction of 
image is based on colour, shape and size content. 
Feature extraction of image is optimal. Optimal 
Feature of image is classified by RBF classifier. 
Classification of image is using RBF neural network. 
Radial basis network (RBF) is a artificial neural 
network technique. It is provide supervised 
classification of image features. The Gaussian 
distributionfunction is used in hidden unit of RBF 
network. Classifications of optimal feature of image 
are implemented by RBF algorithm.In Radial basis 
function, feature value is represented in matrix form. 
In this technique distance of pixel is measured 
optically with origin. This technique provides better 
performance and accuracy of image compare to KNN 
and SVM classification approach. For the 
improvement of support vector machine classifier we 
used RBF neural network and POS optimization 
technique. Our empirical result shows better 
efficiency instead of support vector machine 
classifier. This approach provides better result of 
colour feature of image classification. 

Future scope of work 

Radial basis function network have a hidden 
processing unit where apply different type of training 
algorithm. It increases image quality through 
modification of algorithm. Timing of classification 
can also improve compare to other classification 
technique. RBF network can be applied to other type 
of classification technique of image processing. The 
Accuracy of classification of image play vital role in 
medical field. RBF network and POS optimization 
technique usingtrained feature so this technique can 
be more enhancement are possible in the future. 
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Abstract — This paper addresses about various image 
compression techniques. On the basis of analyzing the various 
image compression techniques this paper presents a survey of 
existing research papers. In this paper we analyze different types 
of existing method of image compression. Compression of an 
image is significantly different then compression of binary raw 
data. To solve these use different types of techniques for image 
compression. Now there is question may be arise that how to 
image compress and which types of technique is used. For this 
purpose there are basically two types are method are introduced 
namely lossless and lossy image compression techniques. In 
present time some other techniques are added with basic method. 
In some area neural network genetic algorithms are used for 
image compression. 

Keywords-Image Compression; Lossless; Lossy; Redundancy; 
Benefits of Compression. 



I. 



Introduction 



An image is an artifact that depicts or records visual 
perception. Images are important documents today; to work 
with them in some applications there is need to be 
compressed. Compression is more or less it depends on our 
aim of the application. Image compression plays a very 
important role in the transmission and storage of image data as 
a result of and storage limitations. The main aim of image 
compression is to represent an image in the fewest number of 
bits without losing the essential information content within an 
original image. Compression [3] techniques are being rapidly 
developed for compress large data files such as images. With 
the increasing growth of technology a huge amount of image 
data must be handled to be stored in a proper way using 
efficient techniques usually succeed in compressing images. 
There are some algorithms that perform this compression in 
different ways; some are lossless and lossy. Lossless keep the 
same information as the original image and in lossy some 
information loss when compressing the image. Some of these 
compression techniques are designed for the specific kinds of 
images, so they will not be so good for other kinds of images. 
In Some algorithms let us change few parameters they use to 
adjust the compression better to the image. Image compression 



is an application of data compression that encodes the original 
image with fewer bits. The objective of image compression [1] 
is to reduce the redundancy of the image and to store or 
transmit data in an efficient form. 
The compression ratio is defined as follows: 
C r= N1/N2 

where Nl is the data of the actual image and N2 is the data of 
compressed image. 



II. Image compression 

Image compression addresses the problem of reducing the 
amount of information required to represent a digital image. It 
is a process intended to yield a compact representation of an 
image, thereby reducing the image storage transmission 
requirements. Every image will have redundant data. 
Redundancy means the duplication of data in the image. Either 
it may be repeating pixel across the image or pattern, which is 
repeated more frequently in the image.The image compression 
occurs by taking benefit of redundant information of in the 
image. Reduction of redundancy provides helps to achieve a 
saving of storage space of an image. Image compression is 
achieved when one or more of these redundancies are reduced 
or eliminated. In image compression, three basic data 
redundancies can be identified and exploited. Compression is 
achieved by the removal of one or more of the three basic data 
redundancies. 

A. Inter Pixel Redundancy 

In image neighbouring pixels are not statistically independent. 
It is due to the correlation between the neighboring pixels of 
an image. This type of redundancy is called Inter-pixel 
redundancy. This type of redundancy is sometime also called 
spatial redundancy. This redundancy can be explored in 
several ways, one of which is by predicting a pixel value based 
on the values of its neighboring pixels. In order to do so, the 
original 2-D array of pixels is usually mapped into a different 
format, e.g., an array of differences between adjacent pixels. If 
the original image [20] pixels can be reconstructed from the 
transformed data set the mapping is said to be reversible. 



Identify applicable sponsor/s here, (sponsors) 



51 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol. 1 1 , No. 1 1 , November 201 3 



B. Coding Redundancy 

Consists in using variable length code words selected as to 
match the statistics of the original source, in this case, the 
image itself or a processed version of its pixel values. This 
type of coding is always reversible and usually implemented 
using lookup tables (LUTs). Examples of image coding 
schemes that explore coding redundancy are the Huffman 
codes and the arithmetic coding technique. 

C. Psycho Visual Redundancy 

Many experiments on the psycho physical aspects of human 
vision have proven that the human eye does not respond with 
equal sensitivity to all incoming visual information; some 
pieces of information are more important than others. Most of 
the image coding algorithms in use today exploit this type of 
redundancy, such as the Discrete Cosine Transform (DCT) 
based algorithm at the heart of the JPEG encoding standard. 

JH. Benefits of compression 

• It provides a believable cost savings involved with 
sending less data over the switched telephone network 
where the cost of the call is really usually based upon 
its duration. 

• It not only reduces storage requirements but also 
overall execution time. 

• It reduces the probability of transmission errors since 
fewer bits are transferred. 

• It provides a level of security against unlawful 
monitoring. 

IV. Comparison between lossless and lossy 

TECHNIQUES 

In lossless compression schemes, the reconstructed image, 
after compression, is numerically identical to the original 
image. However lossless compression can only a achieve a 
modest amount of compression. An image reconstructed 
following lossy compression contains degradation relative to 
the original. Often this is because the compression scheme 
completely discards redundant information. However, lossy 
schemes are capable of achieving much higher compression. 

A. Types of Image Compression 

On the bases of our requirements image compression 
techniques are broadly bifurcated in following two major 
categories. 

• Lossless image compression 

• Lossy image compression 



and GIF. When to use a certain image compression format 
really depends on what is being compressed. 

a) Run Length Encoding: Run -length encoding (RLE) is 
a very simple form of image compression in which runs of 
data are stored as a single data value and count, rather than as 
the original run. It is used for sequential [19] data and it is 
helpful for repetitive data. In this technique replaces sequences 
of identical symbol (pixel), called runs. The Run length code 
for a grayscale image is represented by a sequence { Vi , Ri } 
where Vi is the intensity of pixel and Ri refers to the number of 
consecutive pixels with the intensity Vi as shown in the figure. 
This is most useful on data that contains many such runs for 
example, simple graphic images such as icons, line drawings, 
and animations. It is not useful with files that don't have many 
runs as it could greatly increase the file size. Run-length 
encoding performs lossless image compression [4]. Run- 
length encoding is used in fax machines. 
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b) Entropy Encoding: In information theory an entropy 
encoding is a lossless data compression scheme that is 
independent of the specific characteristics of the medium. One 
of the main types of entropy coding creates and assigns a 
unique prefix-free code for each unique symbol that occurs in 
the input. These entropy encoders then compress the image by 
replacing each fixed-length input symbol with the 
corresponding variable-length prefix free output codeword. 

c) Huffman Encoding: In computer science and 
information theory, Huffman coding is an entropy encoding 
algorithm used for lossless data compression. It was developed 
by Huffman. Huffman coding [8] today is often used as a 
"back-end" to some other compression methods. The term 
refers to the use of a variable-length code table for encoding a 
source symbol where the variable-length code table has been 
derived in a particular way based on the estimated probability 
of occurrence for each possible value of the source symbol. 
The pixels in the image are treated as symbols. The symbols 
which occur more frequently are assigned a smaller number of 
bits, while the symbols that occur less frequently are assigned 
a relatively larger number of bits. Huffman code is a prefix 
code. This means that the (binary) code of any symbol is not 
the prefix of the code of any other symbol. 



1 ) Lossless Compression Techniques: 
Lossless compression compresses the image by encoding all 
the information from the original file, so when the image is 
decompressed, it will be exactly identical to the original 
image. Examples of lossless [2] image compression are PNG 



d) Arithmetic Coding : Arithmetic coding is a form of 
entropy encoding used in lossless data compression. Normally, 
a string of characters such as the words "hello there" is 
represented using a fixed number of bits per character, as in 
the ASCII code. When a string is converted to arithmetic 
encoding, frequently used characters will be stored with little 
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bits and not-so-frequently occurring characters will be stored 
with more bits, resulting in fewer bits used in total. Arithmetic 
coding differs from other forms of entropy encoding such 
as Huffman coding [10] in that rather than separating the input 
into component symbols and replacing each with a code, 
arithmetic coding encodes the entire message into a single 
number. 

e) Lempel-Ziv-Welch Coding: Lempel-Ziv- Welch 
(LZW) is a universal lossless data compression algorithm 
created by Abraham Lempel, Jacob Ziv, and Terry Welch. It 
was published by Welch in 1984 as an improved 
implementation of the LZ78 algorithm published by Lempel 
and Ziv in 1978. LZW is a dictionary based coding. Dictionary 
based coding can be static or dynamic. In static dictionary 
coding, dictionary is fixed when the encoding and decoding 
processes. In dynamic dictionary coding, dictionary is updated 
on fly. The algorithm is simple to implement, and has the 
potential for very high throughput in hardware 
implementations. It was the algorithm of the widely used 
UNIX file compression utility compress, and is used in the 
GIF image format. LZW compression became the first widely 
used universal image compression method on computers. A 
large English text file can typically be compressed via LZW to 
about half its original size. 

2) Lossy Compression Techniques: 
Lossy compression as the name implies leads to loss of some 
information. The compressed image is similar to the original 
uncompressed image but not just like the previous as in the 
process of compression [9] some information concerning the 
image has been lost. They are typically suited to images. The 
most common example of lossy compression is JPEG. An 
algorithm that restores the presentation to be the same as the 
original image are known as lossy techniques. Reconstruction 
of the image is an approximation of the original image, 
therefore the need of measuring of the quality of the image for 
lossy compression technique. Lossy compression technique 
provides a higher compression ratio than lossless compression. 

Major performance considerations of a lossy compression scheme 
include: 

• Compression ratio 

• Signal to noise ratio 

• Speed of encoding & decoding 

Lossy image compression techniques include following 
schemes: 

a) Scalar Quantization: The most common type of 
quantization is known as scalar quantization. Scalar 
quantization, typically denoted as Y=Q (x), is the process of 
using a quantization function Q to map a scalar (one- 
dimensional) input value x to a scalar output value Y. Scalar 
quantization can be as simple and intuitive as rounding high- 
precision numbers to the nearest integer, or to the nearest 
multiple of some other unit of precision. 



b) Vector Quantization: Vector quantization (VQ) is a 
classical quantization technique from signal processing which 
allows the modeling of probability density functions by the 
distribution of prototype vectors. It was originally used 
for image compression. It works by dividing a large set of 
points (vectors) into groups having approximately the same 
number of points closest to them. The density matching 
property of vector quantization is powerful, especially for 
identifying the density of large and high-dimensioned data. 
Since data points are represented by the index of their closest 
centroid, commonly occurring data have low error, and rare 
data high error. This is why VQ is suitable for lossy data 
compression. It can also be used for lossy data correction 
and density estimation. 

V. Literature survey 

In 2010, Jau-Ji Shen et al presents vector quantization based 
image compression technique [5]. In this paper they adjust the 
encoding of the difference map between the original image 
and after that it's restored in VQ compressed version. Its 
experimental results show that although there scheme needs to 
provide extra data, it can substantially improve the quality of 
VQ compressed images, and further be adjusted depending on 
the difference map from the lossy compression to lossless 
compression. 

In 2011, Suresh Yerva, et al presents the approach of the 
lossless image compression using the novel concept of image 
[6] folding. In this proposed method uses the property of 
adjacent neighbor redundancy for the prediction. In this 
method, column folding followed by row folding is applied 
iteratively on the image till the image size reduces to a smaller 
pre-defined value. The proposed method is compared with the 
existing standard lossless image compression algorithms and 
the results show comparative performance. Data folding 
technique is a simple approach for compression that provides 
good compression efficiency and has lower computational 
complexity as compared to the standard SPIHT technique for 
lossless compression. 

In 2012, Firas A. Jassim, et al presents a novel method for 
image compression which is called five module method 
(FMM). In this method converting each pixel value in 8x8 
blocks [7] into a multiple of 5 for each of RGB array. After 
that the value could be divided by 5 to get new values which 
are bit length for each pixel and it is less in storage space than 
the original values which is 8 bits. This paper demonstrates the 
potential of the FMM based image compression techniques. 
The advantage of their method is it provided high PSNR (peak 
signal to noise ratio) although it is low CR (compression 
ratio). This method is appropriate for bi-level like black and 
white medical images where the pixel in such images is 
presented by one byte (8 bit). As a recommendation, a variable 
module method (X) MM, where X can be any number, may be 
constructed in latter research. 
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In 2012, Ashutosh Dwivedi, et al presents a novel hybrid 
image compression technique. This technique inherits the 
properties of localizing the global spatial and frequency 
correlation from wavelets and classification and function 
approximation tasks from modified forward-only counter 
propagation neural network (MFOCPN) for image 
compression. In this scheme several tests are used to 
investigate the usefulness of the proposed scheme. In this 
paper, they explore the use of MFO-CPN [11] networks to 
predict wavelet coefficients for image compression. In this 
method, they combined the classical wavelet based method 
with MFO-CPN. The performance of the proposed network is 
tested for three discrete wavelet transform functions. In this 
they analysis that Haar wavelet results in higher compression 
ratio but the quality of the reconstructed image is not good. On 
the other hand db6 with the same number of wavelet 
coefficients leads to higher compression ratio with good 
quality. Overall they found that the application of db6 wavelet 
in image compression out performs other two. 

In 2012, Yi-Fei Tan, et al presents image compression 
technique based on utilizing reference points coding with 
threshold values. This paper intends to bring forward an image 
compression method which is capable to perform both lossy 
and lossless compression. A threshold [12] value is associated 
in the compression process, different compression ratios can 
be achieved by varying the threshold values and lossless 
compression is performed if the threshold value is set to zero. 
The proposed method allows the quality of the decompressed 
image to be determined during the compression process. In 
this method If the threshold value of a parameter in the 
proposed method is set to 0, then lossless compression is 
performed. Lossy compression is achieved when the threshold 
value of a parameter assumes positive values. Further study 
can be performed to calculate the optimal threshold value T 
that should be used. 

In 2012, S.Sahami, et al presents a bi-level image compression 
techniques using neural networks". It is the lossy image 
compression technique. In this method, the locations of pixels 
of the image are applied to the input of a multilayer perceptron 
neural network [13]. The output the network denotes the pixel 
intensity or 1. The final weights of the trained neural- 
network are quantized, represented by few bites, Huffman 
encoded and then stored as the compressed image. Huffman 
encoded and then stored as the compressed image. In the 
decompression phase, by applying the pixel locations to the 
trained network, the output determines the intensity. The 
results of experiments on more than 4000 different images 
indicate higher compression rate of the proposed structure 
compared with the commonly used methods such as comite 
consultatif international telephonique of telegraphique 
graphique (CCITT) G4 and joint bi-level image expert group 
(JBIG2) standards. The results of this technique provide High 
compression ratios as well as high PSNRs were obtained using 
the proposed method. In the future they will use activity, 



pattern based criteria and some complexity measures to 
adaptively obtain high compression rate. 

In 2013, C. Rengarajaswamy, et al presents a novel technique 
in which done encryption and compression of an image. In this 
method stream cipher is used for encryption of an image after 
that SPIHT [14] is used for image compression. In this paper 
stream cipher encryption is carried out to provide better 
encryption used. SPIHT compression provides better 
compression as the size of the larger images can be 
chosen and can be decompressed with the minimal or no 
loss in the original image. Thus high and confidential 
encryption and the best compression rate has been energized 
to provide better security the main scope or aspiration of this 
paper is achieved. 

In 2013, S. Srikanth, et al presents a technique for image 
compression which is use different embedded Wavelet based 
image coding with Huffman-encoder for further compression. 
In this paper they implemented the SPIHT and EZW 
algorithms with Huffman encoding [15] using different 
wavelet families and after that compare the PSNRs and bit 
rates of these families. These algorithms were tested on 
different images, and it is seen that the results obtained by 
these algorithms have good quality and it provides high 
compression ratio as compared to the previous exist lossless 
image compression techniques. 

In 2013, Pralhadrao V Shantagiri, et al presents a new spatial 
domain of lossless image compression algorithm for synthetic 
color image of 24 bits. This proposed algorithm use reduction 
of size of pixels for the compression of an image. In this the 
size of pixels [16] is reduced by representing pixel using the 
only required number of bits instead of 8 bits per color. This 
proposed algorithm has been applied on asset of test images 
and the result obtained after applying algorithm is 
encouraging. In this paper they also compared to Huffman, 
TIFF, PPM-tree, and GPPM. In this paper, they introduce the 
principles of PSR (Pixel Size Reduction) lossless image 
compression algorithm. They also had shows the procedures 
of compression and decompression of their proposed 
algorithm. Future work of this paper uses the other tree based 
lossless image compression algorithm. 

In 2013, K. Rajkumar, et al presents an implementation of 
multiwavelet transform coding for lossless image 
compression. In this paper the performance of the IMWT 
(Integer Multiwavelet Transform) for lossless studied. The 
IMWT provides good result with the image reconstructed. In 
this paper the performance of the IMWT [17] for lossless 
compression of images with magnitude set coding have been 
obtained. In this proposed technique the transform coefficient 
is coded with a magnitude set of coding & run length encoding 
technique. The performance of the integer multiwavelet 
transform for the lossless compression of images was 
analyzed. It was found that the IMWT can be used for the 
lossless image compression. The bit rate obtained using the 
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MS-VLI (Magnitude Set-Variable Length Integer 
Representation) with RLE scheme is about 2.1 bpp (bits per 
pixel) to 3.1 bpp less then that obtain using MS-VLI without 
RLE scheme. 

In 2013 S. Dharanidharan, et al presents a new modified 
international data encryption algorithm to encrypt the full 
image in an efficient secure manner, and encryption after the 
original file will be segmented and converted to other image 
file. By using Huffman algorithm the segmented image files 
are merged and they merge the entire segmented image to 
compress into a single image. Finally they retrieve a fully 
decrypted image. Next they find an efficient way to transfer 
the encrypted images to multipath routing techniques. The 
above compressed image has been sent to the single pathway 
and now they enhanced with the multipath routing algorithm, 
finally they get an efficient transmission and reliable, efficient 
image. 



VI. Conclusion 

This paper presents various techniques of image 
compression. These are still a challenging task for the 
researchers and academicians. There are mainly two types of 
image compression techniques exist. Comparing the 
performance of compression technique is difficult unless 
identical data sets and performance measures are used. Some 
of these techniques are obtained good for certain applications 
like security technologies. After study of all techniques it is 
found that lossless image compression techniques are most 
effective over the lossy compression techniques. Lossy 
provides a higher compression ratio than lossless. 
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Abstract — In this research work Resource Reservation Protocol 
(RSVP) - which works on receiver - oriented approach is used. 
Two different networks have been designed and implemented 
using OPNET. In the first scenario the client are available with 
and without the use of RSVP. In this scenario, the parameters 
that have been selected, simulated and analyzed are reservation 
status message, reservation and path states in all value mode, 
traffic delay experienced in the form of end-to-end delay 
parameter with and without the use of RSVP, packet delay 
variation with and without RSVP. The analysis reveal that the 
attempted reservation status was successful, the number of 
reservation and path states were one, the end-to-end delay with 
the use of RSVP was comparatively lower than with the use of 
RSVP and also the packet delay variation for node with RSVP 
was lower than that of the node not using RSVP. In another 
scenario the network was duplicated but the link used for 
connecting the subnets was changed from DS1 (1.544 Mbps) to 
DS3 (44.736 Mbps). The parametric analysis indicated that end- 
to-end delay, Packet delay variation for the network with DS3 as 
the link, was lower than the network with DS1. 

Keywords: RSVP, OPNET 

I. Introduction 
An internetwork is a collection of individual networks, 
connected by intermediate networking devices, that functions 
as a single large network. Internetworking refers to the 
industry, products, and procedures that meet the challenge of 
creating and administering internetworks. Figure 1 illustrates 
some different kinds of network technologies that can be 
interconnected by routers and other networking devices to 
create an internetwork. Implementing a functional internetwork 
is no simple task. Many challenges must be faced, especially in 
the areas of connectivity, reliability, network management, and 
flexibility. Each area is a key in establishing an efficient and 
effective internetwork. The challenge when connecting various 
systems is to support communication among disparate 
technologies. Different sites, for example, may use different 
types of media operating at varying speeds, or may even 
include different types of systems that need to communicate. 
Because companies rely heavily on data communication, 



internetworks must provide a certain level of reliability. This is 
an unpredictable world; so many large internetworks include 
redundancy to allow for communication even when problems 
occur. 




Figure 1 : Internetwork using different Network Technologies 

Furthermore, network management must provide centralized 
support and troubleshooting capabilities in an internetwork. 
Configuration, security, performance, and other issues must be 
adequately addressed for the internetwork to function 
smoothly. Security within an internetwork is essential. Because 
nothing in this world is stagnant, internetworks must be flexible 
enough to change with new demands. 

1. 1 Routing Protocols 
The OSI model provides a conceptual framework for 
communication between computers, but the model itself is not 
a method of communication. Actual communication is made 
possible by using communication protocols. In the context of 
data networking, a protocol is a formal set of rules and 
conventions that governs how computers exchange information 
over a network medium. A protocol implements the functions 
of one or more of the OSI layers. A wide variety of 
communication protocols exist. Some of these protocols 
include LAN protocols, WAN protocols, network protocols, 
and routing protocols. 
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Figure3: WAN Technologies in OSI Model 

LAN protocols operate at the physical and data link layers of 
the OSI model and define communication over the various 
LAN media. WAN protocols operate at the lowest three layers 
of the OSI model and define communication over the various 
wide-area media. Routing protocols are network layer 
protocols that are responsible for exchanging information 
between routers so that the routers can select the proper path 
for network traffic. Finally, network protocols are the various 
upper-layer protocols that exist in a given protocol suite. Many 
protocols rely on others for operation. For example, many 
routing protocols use network protocols to exchange 
information between routers. Figure 2 illustrates how several 
popular LAN protocols map to the OSI reference model. Figure 
3 illustrates the relationship between the common WAN 
technologies and the OSI model. Routing algorithms often have 
one or more of the following design goals: 
Optimality 

Simplicity and low overhead 
Robustness and stability 
Rapid convergence 
Flexibility 
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II. ROUTED PROTOCOLS 

Routed protocols are transported by routing protocols across an 
internetwork. In general, routed protocols in this context also 
are referred to as network protocols. These network protocols 
perform a variety of functions required for communication 
between user applications in source and destination devices, 
and these functions can differ widely among protocol suites. 
Network protocols occur at the upper five layers of the OSI 
reference model: the network layer, the transport layer, the 
session layer, the presentation layer, and the application layer. 
Confusion about the terms routed protocol and routing protocol 
is common. Routed protocols are protocols that are routed over 
an internetwork. Examples of such protocols are the Internet 
Protocol (IP), DECnet, AppleTalk, Novell NetWare, OSI, 
Banyan VINES, and Xerox Network System (XNS). Routing 
protocols, on the other hand, are protocols that implement 
routing algorithms. Put simply, routing protocols are used by 
intermediate systems to build tables used in determining path 
selection of routed protocols. Examples of these protocols 
include Interior Gateway Routing Protocol (IGRP), Enhanced 
Interior Gateway Routing Protocol (Enhanced IGRP), Open 
Shortest Path First (OSPF), Exterior Gateway Protocol (EGP), 
Border Gateway Protocol (BGP), Intermediate System-to- 
Intermediate System (IS -IS), and Routing Information Protocol 
(RIP). 



H.l RS VP PROTOCOL 
The Resource Reservation Protocol (RSVP) is a Transport 
Layer protocol designed to reserve resources across a network 
for an integrated services Internet. RSVP operates over an IPv4 
or IPv6 Internet Layer and provides receiver-initiated setup of 
resource reservations for multicast or unicast data flows with 
scaling and robustness. It does not transport application data 
but is similar to a control protocol, like Internet Control 
Message Protocol (ICMP) or Internet Group Management 
Protocol (IGMP). RSVP is described in RFC 2205. RSVP can 
be used by either hosts or routers to request or deliver specific 
levels of quality of service (QoS) for application data streams 
or flows. RSVP defines how applications place reservations 
and how they can relinquish the reserved resources once the 
need for them has ended. RSVP operation will generally result 
in resources being reserved in each node along a path. RSVP is 
not a routing protocol and was designed to inter-operate with 
current and future routing protocols. RSVP by itself is rarely 
deployed in telecommunications networks today but the traffic 
engineering extension of RSVP, or RSVP-TE, is becoming 
more widely accepted nowadays in many QoS-oriented 
networks. Next Steps in Signaling (NSIS) is a replacement for 
RSVP. The Resource Reservation Protocol (RSVP) is a 
network-control protocol that enables Internet applications to 
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obtain differing qualities of service (QoS) for their data flows. 
Such a capability recognizes that different applications have 
different network performance requirements. Some 
applications, including the more traditional interactive and 
batch applications, require reliable delivery of data but do not 
impose any stringent requirements for the timeliness of 
delivery. Newer application types, including 
videoconferencing, IP telephony, and other forms of 
multimedia communications require almost the exact opposite: 
Data delivery must be timely but not necessarily reliable. Thus, 
RSVP was intended to provide IP networks with the capability 
to support the divergent performance requirements of differing 
application types. 



Sending host 








a ■ 


a. 

RSVP receivers 



4 



Figure4: RSVP Data Flows 

It is important to note that RSVP is not a routing protocol. 
RSVP works in conjunction with routing protocols and installs 
the equivalent of dynamic access lists along the routes that 
routing protocols calculate. Thus, implementing RSVP in an 
existing network does not require migration to a new routing 
protocol. In RSVP, a data flow is a sequence of datagram's that 
have the same source, destination (regardless of whether that 
destination is one or more physical machines), and quality of 
service. QoS requirements are communicated through a 
network via a flow specification, which is a data structure used 
by internetwork hosts to request special services from the 
internetwork. A flow specification describes the level of 
service required for that data flow. This description takes the 
form of one of three traffic types. These traffic types are 
identified by their corresponding RSVP class of service: 

1. Best-effort 

2. Rate-sensitive 

3. Delay-sensitive 

Best-effort traffic is traditional IP traffic. Applications include 
file transfer (such as mail transmissions), disk mounts, 
interactive logins, and transaction traffic. These types of 
applications require reliable delivery of data regardless of the 
amount of time needed to achieve that delivery. Best-effort 
traffic types rely upon the native TCP mechanisms to 
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re-sequence data-grams received out of order, as well as to 
request retransmissions of any data-grams lost or damaged in 
transit. Rate-sensitive traffic requires a guaranteed transmission 
rate from its source to its destination. An example of such an 
application is H.323 videoconferencing, which is designed to 
run on ISDN (H.320) or ATM (H.310), but is also found on the 
Internet and many IP -based intranets. H.323 encoding is a 
constant (or nearly constant) rate, and it requires a constant 
transport rate such as is available in a circuit-switched network. 
By its very nature, IP is packet-switched. Thus, it lacks the 
mechanisms to support a constant bit rate of service for any 
given application's data flow. RSVP enables constant bit-rate 
service in packet-switched networks via its rate-sensitive level 
of service. This service is sometimes referred to as guaranteed 
bit-rate service. 

Delay-sensitive traffic is traffic that requires timeliness of 
delivery and that varies its rate accordingly. MPEG-II video, 
for example, averages about 3 to 7 Mbps, depending on the 
amount of change in the picture. As an example, 3 Mbps might 
be a picture of a painted wall, although 7 Mbps would be 
required for a picture of waves on the ocean. MPEG-II video 
sources send key and delta frames. Typically, 1 or 2 key frames 
per second describe the whole picture, and 13 or 28 frames 
(known as delta frames) describe the change from the key 
frame. Delta frames are usually substantially smaller than key 
frames. As a result, rates vary quite a bit from frame to frame. 
A single frame, however, requires delivery within a specific 
time frame or the CODEC (code-decode) is incapable of doing 
its job. A specific priority must be negotiated for delta-frame 
traffic. RSVP services supporting delay-sensitive traffic are 
referred to as controlled-delay service (non-real-time service) 
and predictive service (real-time service). 



III. PRESENT WORK 

The Resource Reservation Protocol (RSVP) is a Transport 
Layer protocol designed to reserve resources across a network 
for an integrated services approach. It works on the policy of 
the receiver-oriented approach. In this approach the receivers 
keep a track of their own resource requirements and they 
periodically send refresh messages to keep the soft state in 
place. RSVP uses the concept of a "soft state", and that all 
states are refreshed every "Refresh Interval" seconds. If the 
routes do not change during the course of simulation (i.e. no 
failure/recovery or load balancing is used in the system), and if 
there is no packet loss in the network, then it can be assumed 
that all Path and Reservation states will not change during the 
lifetime of a session unless they are deleted. In such a scenario, 
there is no need to send refresh messages. This decreases the 
simulation time and memory requirements. RSVP Simulation 
Efficiency attribute, if enabled, no refresh messages are sent, 
and no Path and Reserve refreshes are scheduled. RSVP 
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Simulation Efficiency attribute, if disabled, refresh messages 
are generated. In this thesis different network scenarios have 
been simulated that carries real-time applications. These 
network scenarios utilize RSVP to provide QoS to different 
types of applications like audio, video or data transfer. This 
thesis focuses on how RSVP helps in optimizing the 
performance of the applications utilizing this protocol. The 
statistics which can be/are measured to study RSVP behavior 
are: 

• RSVP Global Statistics capture the total amount of 
RSVP traffic sent and received in the whole network. These 
statistics show the message overhead of RSVP processing in 
the network. 

• RSVP Node Statistics can be divided into three 
groups: message statistics, state statistics, and reservation 
statistics. 

— Message statistics show the number of RSVP message 
(Path, Resv, Confirmation and total) received or sent by a node. 

— State statistics show the number of RSVP states (Path, Resv 
and Blockade), collected as average value over a period of 
time. 

— Reservation statistics show the number of successful or 
rejected reservations, collected as average values over a period 
of time. 

• IP Interface RSVP Statistics show the RSVP 
Allocated Bandwidth (bytes/sec) and Buffer Size (bytes) for 
each interface. 

The goal in this simple RSVP scenario is to: 

• Observe whether traffic using RSVP reservation experiences 
less delay than traffic that does not use RSVP reservations in a 
heavily loaded network, 

• Highlight some configuration aspects, and 

• Collect and discuss selected RSVP statistics. 

RSVP configuration has following certain configuration 
aspects of this scenario should be noted: 

• RSVP should be supported on all host interfaces and on the 
active interfaces of routers that support RSVP. This is done by 
editing the Interface Information table on the nodes. 

• RSVP should be supported for the audio application. This is 
configured in the Voice Application Definition table. 

• RSVP should be supported on all receivers and on any 
senders that will make reservations. 

The proposed scenario is as shown in figure5. 
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Figure5: Scenario. 



IV. RESULTS 
RSVP is a transport layer protocol that enables a network to 
provide differentiated levels of service to specific flows of data. 
Ostensibly, different application types have different 
performance requirements. RSVP acknowledges these 
differences and provides the mechanisms necessary to detect 
the levels of performance required by different applications and 
to modify network behaviors to accommodate those required 
levels. Over time, as time and latency-sensitive applications 
mature and proliferate, RSVP's capabilities will become 
increasingly important. 

Reservation Status Messages: For any attempted reservation 
you should obtain a log messages indicating whether or not the 
reservation was successful. 
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Figurel: Simulation Log Message for a Successful Reservation 

If the reservation was successful, the message gives the IP 
address of the node which sent the Reservation Confirmation 
message and the reservation parameters. If the reservation was 
unsuccessful, the message gives the reason the reservation was 
not made, the requested traffic parameters, and the IP address 
of the node which refused the reservation. 
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Session address: 
Session port: 
IP Protocol : 
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Resv buffer size: 
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Figure2: Simulation Log Message for a Successful Reservation 
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Figure3: Reservation and Path States on Router 



Figure4: End to End Delay for Traffic: with and Without RSVP 
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Figure5: Packet delay variation for Traffic: with and Without RSVP 

As expected, each Path state was created before its 
corresponding Reservation state. Since the reservation was 
made for traffic in both directions, the number of Path and 
Reservation states is one. The reservation was made for 
bandwidth 5,000 bytes/sec, and buffer size 10,000 bytes. 
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Figure6: End to End Delay variation for DS1 (1.544 Mbps) and DS3 (44.736 
Mbps) link 
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Figure7: Packet delay variation for DS1 (1.544 Mbps) and DS3 (44.736 
Mbps) link 
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Figure8: End to End Delay variation for DS1 (1.544 Mbps) and DS3 (44.736 
Mbps) link 



Figure9: Packet delay variation for DS1 (1.544 Mbps) and DS3 (44.736 
Mbps) link 



60 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 11, No. 11, November 2013 



_»J Al 


hIIp HSVP Caller of Office Nvtwtirk 


1- II n|| x| 




MM CJo5_RSVP 

MM QOS_RSVPl_2 

cim*_*s'*i'*g* <m vole* calling P*cey .P*ek*e End- 


□ -End Dnloiy ^ancJJ 


O.OB 

cam 








0.07 














a.oe, 

□ .OB 




















ii.ii,' 


















1 ' 1 r 1 1 1 

m Om BDi lm -in™ am 30« 
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Figurell: Packet delay variation for DS1 (1.544 Mbps) and DS3 



(44.736 Mbps) link 

The diagram shows the number of Reservation and Path States 
on client Router. This statistics was collected in "All Values" 
Mode. The following diagrams compare the traffic delay 
experienced using RSVP with the delay experienced not using 
RSVP. As expected, traffic using RSVP reservation 
experienced less delay. The Packet delay variation i.e. Variance 
among end to end delays for voice packets received by this 
node. End to end delay for a voice packet is measured from the 
time it is created to the time it is received. This statistic records 
data from all the nodes in the network. 

V. CONCLUSIONS & FUTURE SCOPE 

The analysis reveal that the attempted reservation status was 
successful, the number of reservation and path states were one, 
the end-to-end delay with the use of RSVP was comparatively 
lower than with the use of RSVP and also the packet delay 
variation for node with RSVP was lower than that of the node 
not using RSVP. In another scenario the network designed was 
same as the previous scenario but the link used for connecting 
the subnets was changed from DS1 (1.544 Mbps) to DS3 
(44.736 Mbps). The parametric analysis on simulation 
indicated that end-to-end delay, Packet delay variation for the 
network with DS3 as the link was lower than the network with 
DS1. The model does not support the features like Policy 
control, • Static configuration of reservations, In order for 
RSVP process to run on nodes, RSVP must be supported on the 
interface, and either the WFQ or Custom Queuing scheme must 
be used for packet scheduling. RSVP must be supported on the 
sender and the receiver for RSVP session. The possible 
modifications in this work can be to determine the non-RSVP 
hop affect on the end-to-end delay and throughput, the effect of 
different reservation parameters on delay and throughput, how 
does making reservation in only one direction affect delay? 
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Abstract- In today's world of information technology image 
encryption can be used for providing privacy and for protecting 
intellectual properties. During the transmission of images the 
threat of unauthorized access may increase significantly. Image 
encryption can be used to minimize these problems. In the 
proposed scheme of image encryption using poly substitution 
method we propose the possibility of taking the advantages of 
genetic algorithm features. In poly alphabetic substitution ciphers 
the plaintext letters are enciphered differently depending upon 
their placement in the text. As the name poly alphabetic suggests 
this is achieved by using several two, three keys and random keys 
combinations. 

Key words: Image Encryption, Decryption, Genetic algorithm, 
poly substitution. 

I. INTRODUCTION 

The demand for effective network security is 
increasing exponentially day by day. Businesses have 
an obligation to protect sensitive data from loss or 
theft. Not only businesses see to the security needs; 
they have to understand where the computer is 
vulnerable and how to protect it. In the present 
scenario, where a user needs to be connected 
anyhow, anywhere, anytime. Network security 
research is focusing on four general security services 
that encompassing the various functions required of 
an information security facility [4]. Most useful 
classification of security services are a high level of 
confidentiality, integrity, non repudiation, access 
control, availability and authenticity to information 
that is exchanges over networks. 
A part from need for security as stated above image 
encryption also plays a vital role [6]. The advantages 
of digital images are that the processing of images is 
faster and cost effective, can be effectively stored and 
efficiently transmitted from one place to another, 
when shooting a digital image, one can immediately 
see if the image is good or not, copying a digital 
images is easy, the quality of the digital image will 
not be degraded even if it is copied for several times, 
finally digital technology offers plenty of scope for 
versatile image manipulation. With all this additive 
advantages keeping aside misuse of copyright has 
become easier because images can be copied from 
the internet just by clicking the mouse a couple of 
times so the security of the image processing has 
become a challenging task which is being achieved 
by image encryption. 



II. PROPOSED METHOD 

In this approach image is encrypted by using the Poly 
substitution method and genetic algorithm. This 
strengthens the confidentiality of the data, which is 
the prime necessity of any organization looking 
forward for data security. Figure 1 represents the 
block diagram of the encryption and decryption 
process of the biometric image of retina of an 
individual which is taken as input. 



Encryption 



Decryption # 



Cipher Text 



Figure 1 Block diagram of the proposed method. 

2.1 Genetic Algorithm 

The genetic algorithm is employed for providing 
optimization solution. This is a search algorithm 
based on the mechanics of natural selection and 
natural genetics. The genetic algorithm belongs to 
the family of evolutionary algorithms, along with 
genetic programming, evolution strategies and 
evolutionary programming [3]. Evolutionary 
algorithms can be considered as a broad cast of 
stochastic optimization techniques. An evolutionary 
algorithm maintains a population of candidate's 
solutions for the problem at hand. The population is 
then evolved by the iterative application of a set of 
stochastic operators. The set of operators usually 
consists of mutation, recombination, and selection or 
something very similar. 

2.2 Poly Substitution Method 

2.2.1 Encryption Process 

In poly alphabetic substitution ciphers the plaintext 
letters are enciphered differently depending upon 
their placement in the text. As the name poly 
alphabetic suggests this is achieved by using several 
two, three keys and random keys combinations 
instead of just one. Conceal the context of some 
message from all except the sender and recipient 
(privacy or secrecy), and/or verify the correctness of 
a message to the recipient (authentication) forms the 
basis of many technological solutions to computer 
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and communications security problems. Out of 
various classical encryption techniques different 
substitution ciphers types are existing namely Mono 
Alphabetic Substitution Cipher, Homophonic 
Substitution Cipher, PolyGram Substitution Cipher, 
Transposition Cipher, Poly Alphabetic Substitution 
Cipher[l]. 

Characteristics of poly alphabetic substitution 

• In general uses more than one substitution 
alphabet this makes cryptanalysis harder 

• Since same plain text letter gets replaced by 
several cipher ext letter depending on which 
alphabet is used this gives the added advantage 
of flattening the frequency distribution. 

Example 

Consider a text word "welcome", take el, e2, e3 as 
keys which are assigned as a, b, c for el, e2, e3 
respectively. Let ASCII value of el be 97 and e2 be 
98 and e3 be 99 and take the text , add ASCII value 
of el to value of first character, and e2 to second 
character and e3 to third character, alternatively add 
the value of el , e2, e3 to consecutive characters. 
Three layers to be applied to each three consecutive 
letters and same to be continued thru the remaining 
text. After adding ASCII value of all values of given 
text, the resultant text is an encrypted message, and it 
generate a combination of 3* (256 * 256 * 256) 
letters encrypted coded text with 128 bit manner. To 
embedded the randomness use the feature of genetic 
algorithm [2] [3]. Transposition takes place in each 
character after all the process is over that is moves or 
change one bit either LSB or MSB, the end result is 
increasing security. Finally takes the decimal values 
of each updated character in the given text .this text 
can be called as cipher text [5]. Encryption results are 
shown in the table below. 
Encryption Result 



Charac 
ter 


ASCII 
values 


Added 
continuati 
on .letter 


Binary 
values 


Alter LSB 


Result 


W 


87 


184 


10111000 


10111001 


185 


E 


69 


167 


10100111 


10100110 


166 


L 


76 


175 


10101111 


10101110 


174 


C 


67 


164 


10100100 


10100101 


165 





79 


177 


10110001 


10110000 


176 


M 


77 


176 


10110000 


10110001 


177 


E 


69 


166 


10100110 


10100111 


167 



The Encrypted message is {185, 166, 174, 165, 176, 
177, 167} 

1.2 Decryption Process: 

Consider the ASCII values of each updated character 
in the given text and converted into binary format. 
Transposition takes place in each character after all 
the processes are over that is moves or change one bit 



either LSB or MSB. Subtract ASCII value of all 
values of given text, the resultant text is a decrypted 
messages, and it generate a combination of 3*(256 * 
256 * 256) letters decrypted coded text. Three layers 
to be applied to each three consecutive letters and 
same to be continued thru the remaining text. 
Subtract ASCII value of el from the value of first 
character, and e2 from the second character and e3 
from third character, alternatively subtract the value 
of el, e2, e3 to consecutive characters. Transposition 
takes place in each character after all the process are 
over that is moves or change one bit either LSB or 
MSB , the end result is some binary value Finally 
takes the decimal values of each updated binary value 
in the given text and print. Decrypted message 
"Welcome " and this process shown in table below. 
Decryption Results 



Cipher 


Binary 


Alter LSB 


Subtr 


Remainin 


Plai 


result 


values 




act 


g ASCII 


n 








con. 


values 


text 








letter 






185 


10111001 


10111000 


184 


87 


W 


166 


10100110 


10100111 


167 


69 


E 


174 


10101110 


10101111 


175 


76 


L 


165 


10100101 


10100100 


164 


67 


C 


176 


10110000 


10110001 


177 


79 


O j 


177 


10110001 


10110000 


176 


77 


M j 


167 


10100111 


10100110 


166 


69 


E 



The Plain text is "WELCOME" 



III. EXPERIMENTAL RESULTS 



Pixel 
SNO 


Image pixel 
value 


Added 

continuation 

letter 


Binary 

representation 


Alter MSB 


Result 


1 


00000000 


A 


01100001 


11100001 


225 


2 


00000111 


b 


01101001 


11101001 


233 


3 


11111111 




01100010 


11100010 


226 


4 


11111111 


d 


01100011 


11100011 


227 




00000000 




01100101 


11100101 


229 


6 


00000111 




01101000 


11101000 


232 


7 


11111111 


b 


01100001 


11100001 


225 


8 


11111111 




01100010 


11100010 


226 


9 


00000000 


d 


01100100 


11100100 


228 


10 


00000111 


e 


01101100 


11101100 


236 


I 11 


11111111 


a 


01100000 


11100000 


224 


12 


11111111 


b 


01100001 


11100001 


225 


13 


00000000 




01100011 


11100011 


227 


14 


00000001 


d 


01100101 


11100101 


229 


15 


11111111 


e 


01100100 


11 100100 


228 


16 


11111111 




01100000 


11100000 


224 


17 


00000000 


b 


01100010 


11100010 


226 


18 


00000000 




01100011 


11100011 


227 


19 


00111111 


d 


10100011 


0010001 1 


35 


20 


11111111 


e 


01100100 


11100100 


228 


21 


00000000 




01100001 


11100001 






00000000 


b 


01100010 


11100010 


226 


23 


00001111 




01110010 


11110010 


242 


24 


11111111 


d 


01100011 


11100011 


227 


25 


00000000 


e 


01100101 


11100101 


229 


26 


00000000 


a 


01100001 


11100001 


225 


27 


00000011 


b 


01100101 


11100101 


229 


28 


11111111 


c 


01 100010 


11100010 


226 


29 


00000000 


d 


01100100 


11100100 


228 


30 


00000000 


e 


01 100101 


11100101 


229 


31 


00000001 


a 


01100010 


11100010 


226 


32 


11111111 


b 


01100001 


11100001 


225 


33 


00000000 


c 


01100011 


11100011 


227 


34 


00000000 


d 


01100100 


11100100 


228 


35 


00000001 


e 


01100110 


11100110 


230 


36 


11111111 


a 


01100000 


11100000 


224 
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Pixel 
SNO 


Image pixel 
value 


Added 

continuation 

letter 


Binary 

representation 


Alter MSB 


Result 


37 


00000000 


b 


01I000I0 


11100010 


226 


38 


00000000 


c 


01100011 


11 100011 


227 


39 


00000001 


d 


01100101 


11100101 


229 


40 


11111111 


e 


01100100 


11 100100 


228 


41 


00000000 


a 


01100001 


11100001 


225 


42 


00000000 


b 


01100010 


11100010 


226 


43 


00000001 


c 


01100100 


11 100100 


228 


44 


11111111 


d 


01100011 


11100011 


227 


45 


00000000 


e 


01100101 


11100101 


229 


46 


00000000 


a 


01100001 


11100001 


225 


47 


00000001 


b 


01 100011 


11100011 


227 


48 


11111111 


c 


01100010 


11 100010 


226 


49 


00000000 


d 


01100100 


11100100 


228 


50 


00000000 


e 


01 100101 


1 1 100101 


229 


51 


00001111 


a 


01110000 


1 1 1 10000 


240 


52 


11111111 


b 


01 100001 


11100001 


225 


L 53 


00000000 


c 


01100011 


11100011 


227 


54 


00000000 


d 


01 100100 


11100100 


228 


55 


00001 1 1 1 




01110100 


1 1 1 10100 


244 


56 


11111111 


a 


01100000 


1 1 100000 


224 


57 


00000000 


b 


01100010 


11100010 


226 


58 


00000000 


c 


01100011 


11100011 


227 


59 


11111111 


d 


01100011 


11100011 


227 


60 


00000011 


e 


01101000 


11101000 


232 


61 


00000000 


a 


01100001 


1 1 100001 


225 


62 


00000000 


b 


01100010 


11100010 


226 


63 


00000101 


c 


01101000 


11101000 


232 


64 


11111111 


d 


01100011 


11100011 


227 


65 


01 100000 


e 


1 1000101 


01000101 


69 


66 


00000000 


a 


01100001 


11100001 


225 


67 


00000011 


b 


01 100101 


11100101 


229 


68 


11111111 


c 


01100010 


11100010 


226 


69 


1 1 100000 


d 


01000100 


1 1000100 


196 


70 


00000000 


e 


01100101 


11100101 


229 


71 


0001 1001 


a 


01101101 


11101101 


237 


72 


11111111 


b 


01100001 


11100001 


225 


73 


1 1 1 10000 


c 


01010011 


11010011 


211 


74 


00000000 


d 


01 100100 


11100100 


228 


75 


00111101 


e 


10100010 


00100010 


34 


76 


11111111 


a 


01100000 


11100000 


224 


77 


11110000 


b 


01010010 


11010010 


210 


78 


00000000 


c 


01100011 


11 100011 


227 


79 


00000111 


d 


01101011 


11101011 


235 


80 


11111111 


e 


01100100 


11100100 


228 


81 


1 1 1 10000 


a 


01010001 


11010001 


209 


82 


00000000 


b 


01100010 


11100010 


226 


83 


00001111 


c 


01110010 


1 1 1 10010 


242 


84 


11111111 


d 


01100011 


11100011 


227 


85 


11110000 


e 


01010101 


1 1010101 


213 


86 


00000000 


a 


01 100001 


11100001 


225 


87 


00111111 


b 


10100001 


10100001 


161 


88 


11111111 


c 


01100010 


11 100010 


226 


89 


1111 1000 


d 


01011100 


11011100 


220 


90 


00000000 


e 


01100101 


1 1 100101 


229 


91 


001 1 1 1 1 1 


a 


10100000 


10100000 


160 


92 


11111111 


b 


01100001 


11100001 


225 


93 


1 1 1 1 1 100 


c 


01011111 


1101 1 1 1 1 


223 


94 


00000000 


d 


01100100 


11100100 


228 


95 


00111111 


e 


10100100 


00100100 


36 


96 


11111111 


a 


01100000 


11100000 


224 


r 97 


11111110 


b 


01100000 


11100000 


224 


98 


00000000 


c 


01100011 


11 100011 


227 


99 


00111111 


d 


10100011 


0010001 1 


35 


100 


11111111 


e 


01100100 


11 100100 


228 


101 


11111110 


a 


01011111 


11011111 


223 


102 


00000000 


b 


01100010 


11100010 


226 


103 


00111111 


c 


10100010 


00100010 


34 


104 


11111111 


d 


01100011 


11100011 


227 


105 


11111110 


e 


01100011 


11100011 


227 


106 


00000000 


a 


01100001 


11100001 


225 


107 


00011111 


b 


10000001 


00000001 


1 


108 


00111111 


c 


10100010 


00100010 


34 


109 


11111111 


d 


01100011 


11100011 


227 


110 


00000000 


e 


01100101 


11100101 


229 


111 


000 1 1 1 1 1 


a 


10000000 


00000000 





112 


11111111 


b 


01100001 


11100001 


225 


113 


11111111 


c 


01100010 


11 100010 


226 


114 


00000000 


d 


01100100 


11100100 


228 


115 


00001111 


e 


01110100 


111 10100 


244 


116 


11111111 


a 


01100000 


11100000 


224 


117 


11111111 


b 


01100001 


11100001 


225 


118 


00000000 


c 


01100011 


11100011 


227 


119 


00001111 


d 


01 110011 


11110011 


243 


120 


11111111 


e 


01100100 


11100100 


228 



Pixel 
SNO 


Image pixel 
value 


Added 

continuation 

letter 


Binary 

representation 


Alter MSB 


Result 


121 


11111111 




01 100000 


11 100000 


224 


122 


00000000 


b 


01100010 


11100010 


226 


123 


00000111 




01101010 


11101010 


234 


124 


11111111 


d 


01 10001 1 


11100011 


227 


125 


11111111 


e 


01100100 


11100100 


228 


126 


00000000 




01100001 


11100001 


225 


127 


00000111 


b 


01101001 


11101001 


233 


128 


11111111 


c 


01100010 


11100010 


226 



Here 32X32 image is taken and applied the 
encryption process. First image is converted into the 
binary format. Each pixel of the image is taken and 
added the continuation letter. Here 5 letters are 
considered. Afterwards apply the transposition 
technique to generate randomness. Our experimental 
results show that in context of conversion of image in 
to cipher text where the result column are the 
elements from 1 to 128 respectively. 

IV. CONCLUSION 

The Proposed methodology will give the new area of 
research on cryptography and genetic algorithms. 
This new methodology for image encrypts and 
decrypt using genetic algorithm is definitely an 
effective method while compared with other 
cryptography systems. 

V. FUTURE SCOPE 

The future of the proposed scheme is that it can be 
extended for encrypting the video messages as well 
as sound encryption process. 
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Abstract - Mobile ad hoc network (MANET) is a collection of 
wireless nodes that are distributed without dependency on any 
permanent infrastructure. MANET security has been studied in 
recent years For example the black hole threats which make the 
source believes that the path to the destination being through it. 
Researchers have proposed their secure routing idea in order to 
encounter these threats, the problem is that the security threats 
still exists because it is not prevented or avoided completely in 
addition, some of the solutions adversely affected network 
performance, such as adding additional network overhead and 
time delay. The main objectives of this paper is to discuss some 
recent solutions that work to detect a black hole node by using 
different strategies, one of these solutions is S-ZRP, it will be 
developed in this paper to generate a new proposed solution 
called local intrusion detection by bluff probe packet (LIDBPP), 
it will locally begin detection by the previous node and not by the 
source node as in S-ZRP, this will decrease the negative impact 
on the performance of MANET such as network overhead and 
time delay in AODV based MANET. 

Keywords; LIDBPP, MANET, Black hole, AODV, Network 
security. 

I. INTRODUCTION 

MANET which includes a number of nodes connected by 
wireless link, has many challenges such as security threats 
which hangover nodes, packets, and overall network, this 
network, used widely in military purposes, Disaster area, 
personal area network and so on, routing protocols are 
designed for MANET properties of a self-regulating 
environment without protection against any inside or outside 
network threats, many ideas are proposed to solve the security 
threats, unfortunately the problem has not avoided completely, 
in this paper, the main interest is in organizing the 
information of each technique, and proposing a new algorithm 
called LIDBPP, this algorithm can detect and block multiple 
black holes while maintaining the network performance in 
terms of network overhead and time delay. 

The paper is organized as follow: 
- Section one: introducing the subject of the paper and the 
main interest. 



- Section two: display MANET routing protocols, and 
discussing AODV. 

- Section three: defining black holes, types of black hole 
attacks. 

- Section four: introducing the related works, the paper will 
have a description of each technique, the advantages and 
disadvantages will be discussed by analyzing each paper. 

- Section five: a local intrusion detection by bluff probe packet 

(LIDBPP) will be developed as a new solution. 

- Section six: the paper contains a table with summarized 
information. 

- Section seven: conclusion and the future work. 



II. MANET ROUTING PROTOCOLS 







f Proactive \/ 


Hybrid 


sf Reactive \ 


nsnv [ 




J AODV 


OLSR V 


ZRr 


/ DSR 


V WRP X 




/ TO HA 1 



Figure 1. MANET routing protocols 



Since most of the secure routing ideas in this paper are 
applied on AODV routing protocol, the paper will discuss the 
algorithm of AODV as follows: 

To find a route to the destination, the source 
broadcasts a route request packet (RREQ) immediately 
to the destination if there is a direct link between source 
and destination or the source send (RREQ) to the 
neighboring nodes. The neighbors broadcast (RREQ) to 
their neighbors till it reaches an intermediate node. Each 
node records in its tables the node from which the first 
RREQ came ( this information used for sending RREP). 
The destination or an intermediate node selects the 
fresher route to the destination based on the destination 
sequence number, the destination or the intermediate 
node responds by sending a route reply (RREP) packet 
to the source node using the path established when the 
RREQ was sent. When the source receives the RREP, it 
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establishes a forward path to the destination and sends a 
packet to the source through the path established when 
the source receives the RREP. 

m. BLACK HOLES 

Black hole is one of the most famous security threats, 
it is a node in the network that announces itself as a node 
that has the fresher path for the destination, black hole 
makes the source believe that the path to the destination 
being through it as follows: 




Singe I 



Figure 2. MANET with black hole 

When the source node sends RREQ to Nl, N2 and N3 
since Nl and N2 do not have any route to the destination it 
will not response RREQ, N3 does not have route to the 
destination so it will send RREQ to the neighboring node BH 
(black hole) which will send RREP to the source to make it 
believes that BH has the fresher route to the destination. 
Source node sends data packets to BH but these packets will 
not be sent to the destination, BH will kill this packet instead 
sending it to the destination. 

Types of black hole attacks 

- Single black hole attack: if there is one black hole in the 
network. 

- Multiple black holes attack: if there are more than one black 
hole in the network cooperate with each other against the 
network and cause grater negative influence on the network, 
the solution for multiple black hole is more complex. 

IV. RELATED WORK: Some Black Hole Solutions 

A. A Local Intrusion Detection Security Routing (LIDSR) 
mechanism 

[1] LIDSR mechanism allows the detection of the attacker 
to be locally done, which means that when the suspected 
attacker node (node N5) unicasts the RREP towards the source 
node (node Nl) the previous node (node N4)to the attacker 
node performs the process of detection, and not the source 
node (node Nl) as in SIDSR mechanism [1]. First, the 
previous node (node N4) buffers the RREP packet. Second, it 
uses a new route to the next node (node N6) and sends a 
FRREQ packet to it. When the previous node (Node N4) 
receives the FRREP packet from the next node (Node N6), it 



extracts the information from the FRREP packet and behaves 
according to following rules: 

1. If the next node (N6) has a route to the attacker node (N5) 
and the destination node (N7). In this case, N4 assumes that 
N5 is trusted node and it discards the FRREP packet, then 
unicasts the RREP packet which received from N5 to the 
source node (Nl). 

2. If the next node (N6) has no route to the destination node 
(N7) or the attacker node (N5) or both of them (N5 and N7), 
the previous node (N4) discards the buffered RREP and the 
FRREP as well, at the same time broadcasting the alarm 
message to announce that there is no secure enough route 
available to the destination node (N7). 

[1] The last case includes another scenario, such as the 
case in which the previous node (N4) does not receive any 
FRREP packet from the next node (N6). Here,N6 will discard 
the RREP packet and inform the source node to initiate new 
route discovery process to the destination. 



Source Previous Attacker Next Destination 

Node Node Node Node Node 




Figure 3. MANET with black hole 



Advantages and disadvantages 

The simulation compares LIDSR with SIDSR (source 
intrusion detection security routing mechanism), it proves that 
LIDSR causes lower network overhead, time delay and 
increases throughput by changing the number of nodes, 
network size, and the transmission range, but LIDSR can 
support network with one black hole node and can not deal 
with the networks with multiple cooperative black hole nodes. 

B. BDSR Scheme 

[2] This paper proposes BDSR which merges proactive 
and reactive defense architecture in MANET. The BDSR bait 
the malicious node to reply RREP by using a virtual and 
nonexistent destination address. Finally the detected black 
hole node is listed in the black hole list and notices all other 
nodes in the network to stop any communication with them. 
BDSR use the same method as RREQ of DSR. The RREQ' 
could only survive a period of time. We take advantage of 
black hole's feature that it would fake shortest route 
information and reply the information to source node directly. 
Baited black hole node replies RREP by the above mentioned 
mechanism. Because RREP has the ability of showing the 
address of malicious node after modifying by us, it is able to 
wipe out malicious node among the network in the initial 
period. 

Advantages and disadvantages 

The results of simulation show that the packet delivery 
ratio (PDR) is higher than PDR in case of watchdog solution 
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[4] (The Watchdog use neighbor nodes to overhear and detect 
malicious node. Watchdog depends on overhearing the packets 
whether be discarded deliberately to identify the malicious 
node), in addition BDSR causes less overhead than watchdog. 

But if there are many cooperative black holes, in this case 
BDSR can not deal with them. 

C. Detection by broadcasting the bluff probe packet (S-ZRP) 

[3] Suppose, LI, L2, L3, , Ln-1 are the nodes 

between the source L0 and the destination Ln (we are 
considering Ln as black hole node). The algorithm works as- 
To detect black hole node, Origin L0 sends bluff RREQ 
packet which contains the address of the nonexistent node, to 
the nearest guard node L2. It will check its table for entry of 
nonexistent node. If it is not in its table it will propagate this 
RREQ message to the intermediate nodes till Ln-1 node. 
Previous Next Hop Ln-1 delivers this RREQ message to the 
destination Ln. The destination black hole node replies and 
says that I have a shortest route for nonexistent node. The Ln 
node sends this RREP packet back to the nodes in the 
discovered route. Origin L0 Node Receive RREP(NE)Ln- 

1, 2,1 packet and send BLOCK (Ln, 

NE)IERP/BRP packet to Ln-1 node. This node deletes entry 
for Ln node. Now originator node or guard node broadcast this 
information to all the nodes. 

Advantages and disadvantages 

S-ZRP is an efficient solution to detect the multiple black 
hole nodes and to stop their attack, the simulation shows how 
the approach prevents the black hole nodes from receiving and 
relaying the packets. 

But S-ZRP starts the detection process from the source, 
this strategy will negatively affect MANET performance. In 
addition, the simulation must show how S-ZRP may affect 
important performance metrics such as network overhead and 
time delay. 

V. A PROPOSED SOLUTION: A local Intrusion Detection 
by Bluff Probe Packet (LIDBPP) 

The paper aims to propose a method based on bluff packet 
to detect and stop the black hole attack in AODV based 
MANET, this method can deal with multiple black holes 
attack and will start the detection process by sending a bluff 
packet that includes a specific virtual destination address, an 
intermediate node (the previous node from the black hole) will 
send bluff packet and will take the decision with 
nonintervention from the source node as follow: 

- If the RREQ includes a normal address and the node 
has a route to the destination it will send RREP. 

- If the RREQ includes a normal address and the node 
has not a route to the destination it will forward RREQ to 
the next nodes. 

if the RREQ includes the specific virtual address then 
it is a bluff packet and it must be forwarded, if any node 
sends this packet and then receives RREP from the next 



node, it must send block packet because this node is a black 
hole node. 



As in figure 4 since a black hole node sends RREP 
regardless of the address of RREQ, then it will response to 
bluff packet, so it will be blocked from the previous node. 







RREQltafll _ RREOtwm ^ 




© t) £ 




j vV vV s,ajtEj 




Ln-1 blackv Ln h updjlo nulling ijihk- n\t<i iMUAGMfl upd itli> 









Figure 4. A proposed solution 



- After blocking the black hole, the previous node will repeat 
sending bluff packet to the node that locates next the 
blocked node and the process will be repeated until blocking 
all the black hole nodes as in figure 5, there are no need in 
this process to back to the source node, every intermediate 
node is responsible to block all black hole nodes that locate 
next. 



- Each bluff packet generated from the source will clean the 
network, because bluff packet is moved from a node to a 
next node as a serial process. 











© tY DO -fc) e 




Ln-1 bk>cks Ln+L uptimes routing table and hrwidessts updates 



Figure 5. A proposed solution 



By starting from the previous node, there is no need to 
return to the source node, so the detection and blocking 
process will be occurred with minimal number of packets and 
in short time, so network overhead and time delay will be 
minimized, but in S-ZRP [3] we can see that the detection 
process needs more steps and messages in order to detect and 
block the black hole node, especially if the distance between 
the source and black hole node is long, this will negatively 
affect the network performance such as increasing network 
overhead and time delay. 

The algorithm of LIDBPP is as follow: 

L0: source node, Ll,2 n....n+l: intermediate nodes, 

RREQn: RREQ with normal destination address, RREQs: 
RREQ with specific and virtual destination address. 

Stage 1: Source node L0 

Generate RREQ 

Propagate RREQ 

If RREQn Then 

Precede normal AODV algorithm 

Stage2: Else if RREQs && Ln send RREP to Ln-1 Then 

Stage3: Ln-1 send block Ln 

Ln receive block 
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Stage4: Ln-1 updates routing table and broadcasts updates 
Else 

Ln sends RREQs to Ln+1 
End if 

VI. A SUMMARY TABLE 

Table 1 . A SUMMARY TABLE 



Solution 


Routing 
protocol 


Black hole 
attack 


Strategy 


[1] SIDSR 


AODV 


Single Black 
hole 


The source node is 
used for intrusion 
detection - source 
detection. 


[1] LIDSR 


AODV 


Single Black 
hole 


The previous node 
from the attacker 
node is used for 
intrusion detection 
- local detection 


[2] BDSR 


DSR 


Single Black 
hole 


Baiting a single 
black hole by using 
virtual and non- 
existent destination 
address. 


[3] S-ZRP 


ZRP 


Multiple Black 
holes 


Baiting a multiple 
black holes hv 
broadcasting the bluff 
probe packet that 
contains virtual and 
non-existent 
destination address 
- source detection. 


LIDBPP 


AODV 


Multiple Black 
holes 


Baiting a multiple 
black holes by 
broadcasting the bluff 
probe packet that 
contains virtual and 
non-existent 
destination address 
- local detection 



VII. CONCLUSION 

The paper introduced many recent solutions that worked to 
detect a black hole node by using different strategies, it 
explained how these methods worked and also it discussed the 
advantages and disadvantages of each solution, the paper has 
included a table to summarize the analyzed information of 
each solution, the paper also included a new solution to detect 
multiple black holes based on bluff probe packet (contains a 
specific virtual destination address) that tricks the black hole. 

This approach begins the detection process from the 
previous node of black hole. This method will decrease the 
steps needed to detect and block the black holes if it is 
compared with the method that begins sending the bluff packet 



from the source as in S-ZRP solution, so the process of 
developing this new method would detect and block the black 
hole nodes with high efficiency and less negative impact on 
MANET performance. In the future, the new solution 
(LIDBPP) must be simulated to study the performance of 
MANET and to compare it with the other solutions. 
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ABSTRACT" -There are many non Poisson queuing 
models. This paper mainly deals with the analysis of 
Non-Poisson queues (M/G/l): (GD/oo/ oo) and (Mi /Gi 
/l): (NPRP/oo/oo) .The feasibility of the system is 
analyzed based on the numerical calculations and 
Graphical representations. When the mean system size 
and the queue size is high , optimized value is obtained 
so that the total expected cost is minimized. The outline 
here an approach that may be used to analyze a non 
Poisson model which has job classes of multiple 
priorities. The priority discipline followed may be either 
non-preemptive or preemptive in nature. When the 
priority discipline is non-preemptive in nature, a job in 
service is allowed to complete its service normally even 
if a job of higher priority enters the queue while its 
service is going on. In the preemptive case, the service to 
the ongoing job will be preempted by the new arrival of 
higher priority. If the priority discipline is preemptive 
resume, then service to the interrupted job, when it 
restarts, continues from the point at which the service 
was interrupted. For the preemptive non resume case, 
service already provided to the interrupted job is 
forgotten and its service is started again from the 
beginning. Note that there may be loss of work in the 
preemptive non-resume priority case. Such loss of work 
will not happen in the case of the other two priorities. 
Since the service times are assumed to be exponentially 
distributed, they will satisfy the memory-less property 
and that, therefore, the results will be the same both for 
the preemptive resume and preemptive non-resume 
cases. 

Key Words- Pollazek-Khintchine formula; Priority 
service discipline; Non-Poisson queues 



I. INTRODUCTION 

Queuing models are widely used in industry to 
improve customer service, for example in 
supermarkets, banks and motorway toll booths. In a 
market economy, customers[Movaghar, 1998] who have 
poor service go elsewhere; however, when hospital 
beds are unavailable, patients have no option but to 
wait at home, in accident and emergency departments 
or wards which are inappropriately staffed, possibly 
without access to appropriate specialized equipment. 
For a P priority system, we consider jobs of class 1 to 
be the lowest priority and jobs of class P to be the 
highest priority. We consider here queues with both 
preemptive and non-preemptive priority disciplines. 
The approach suggested may be applied to both 
single server and multi-server queues [Kimura ,1994, 
whitt, 1992]. Moreover, queues with both finite and 
infinite buffers may be analyzed using the suggested 
approach. We outline the way in which such a queue 
may actually be analyzed by solving mean system 
size and queue size based on the numerical 
calculation and graphical representations. The 
Queues in which arrivals and / or departures may not 
follow the Poisson axioms are called Non-Poisson 
queues. 




Poision arrival H 

P rocciS General' 

wrviec time 
distribution 

Fig The M/G/l queue 



70 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol. 11, No. 11, November 2013 



II. RELATED WORK 
Markov chains with generator matrices or block 
matrices of this form are called M/G/l type Markov 
chains a term coined by M. F. Neuts.[Meini, 1998). 



III. MATERIALS AND METHODOLOGY 

Table 1 (M/G/l ):(GD/oo/ oo) 
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A. Techniques adopted for the development of 
Non-Poisson queues 

I) Phase technique :-This technique is used when 
an arrival demands phases of service say k in number 

II) Imbedded Markov chain technique: -The 
technique by which Non-Markovian queues are 
reduced to Markovian is termed as Imbedded Markov 
chain technique. 

III) Supplementary variable technique: When one 
or more random variables are added to convert a 
Non-Markovian process into a Markovian process, 
the technique involved is called supplementary 
variable technique. This technique is used for the 
queuing models (M/G/l), (GI/G/C),(GI/M/S) 
and(GI/EK/l). 



B. Description of Non-Poisson queues 
The aim of this study is to compare the properties, 

namely mean and variance of the two queues. In this 

model , 

M Poisson arrivals 

G General output distribution 

oo Waiting room capacity is infinite 

GD General service discipline such as 

First Come First Served serves jobs in the order they 
arrive (FCFS),Last Come First Served non- 
preemptively serves the job that arrived the most 
recently. (LCFS). 

To determine the mean queue length and mean 
waiting time for this system, following techniques are 
used 

n — > Number of customers in the system just 
after a customer departs 

t->Time to serve the customer following the one 
already departed 

f(t) — > Probability density function of service time 
distribution with mean E(t) and variance var (t) 

k — mumber of new arrivals during 't 

n' — > number of customer's left behind the next 
departing customer 

The result of this model can be applied to any one 
of the three service disciplines FCFS, &FCLS. .The 
derivation for a single server situation is based on the 
following assumptions: 

• Poisson arrivals with arrival rate X 

• General service time distribution with mean 
mean E(t) and variance var (t) 

• Steady state conditions prevail with p = X 
E(t) < 1 
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Under the above assumptions, Pollazek- 
Khintchine formula is given by 



Thus L q = L s - X E(t); 
W q = L q /X 

In this model, queue and service together will 
represent the system. At the time (T+t) There are (n- 
1+k) customers are there in the system. When T 
represents the time when the j th customer departs and 
(T+t) represents the time when the next customer 
(j+l)st departs. It does not necessarily mean that the 
next customers are introduced into the service. 

By the steady state assumptions 

E(n) = E(n')and E(n 2 ) = [E(n') 2 ] 

From the above diagram, 



Let c = 



{JOt* . if n >0 

1 , if n=0 

3 if n > 



Therefore n'=n-l+5+k 
E(n') = E(n)+ E(8) +E(k) - 1 
we have E( 8) = 1 - E(k) 

(n') 2 = [n+(k-l)+5] 2 = n 2 +k 2 +2n(k-l)+5(2k- 
l)+l-2k ;[since5 2 = 5& 5n = 0] 

ThereforeE(n) ={ E(k 2 ) - 2 E(k) +E(5) [2E(k)-l] 



+1 } / {2 [l-E(k)]} 



E(fc 2 )+E(fc)-2E 2 (fc) 
2[1-E(fc)] 



Since the arrivals in time't' follow the Poisson 
distribution 

E(k/t) = X andE( k 2 / 1) = (Xt) 2 +X t 

Hence E(k) = f™E (~)/(t)dt =X E(t) 

E (k 2 ) = foE(k 2 ft) f(t) dt= X 2 Var(t) + X 2 E 2 (t) 
+ X E(t) 



ThusL s =E(n) = A.E(t) + 



(Mi/Gj /1): (NPRP/oo/oo) 



A. 2 [E 2 (t)+Var(t)] 
2[l-AE(t)] 



Here we introduce the concept of priority service 
disciplines .Priority service discipline includes two 
rules : 



1 . Preemptive rule where the service of a low 
priority customer may be interrupted in 
favour of an arriving customer with higher 
priority 

2. Non-preemptive rule where a customer once 
in service will continue until his service is 
completed and regardless of the priority of 
the arriving customer. 

(M; /G; /l): (NPRP/oo/oo) is one of the non- 
preemptive models. These apply to single and 
multiple channel cases. The single channel model 
assumes Poisson arrival and arbitrary service 
distribution. In the multiple channel model both 
arrivals and departures are assumed to be Poisson. 
The symbol NPRP will be used with Kendall 
notation to represent the non-preemptive discipline 
while the symbols M; and G; will represent Poisson 
and arbitrary distributions for the i' queue for i 
= 1,2,. ...m 

Ej(t) — mean; 

Var; (t) variance ; 

Xi the arrival rate at the i th queue unit time 
The results are given by [under the usual 
assumption]^ Z& ^J' ^ 

e 2 (t)+Var(t) 
L q k = X k W q k - W k = W q k + E k (t) ; 
L s fc =L q k + p k where p k = X k E k (t) 
S k = If=iPi< 1 ; k= 1 to m;S =0 

The expected waiting time in the queue for any 
customer regardless of his priority is 

W q = 2T=iYWq fc Where X = S^andA* IX 
is the relative weig htof W q k Similarly for W k 

F(t) be the CDF of the combined service time for the 
entire system 

X = YiLi &i is the combined arrival rate 

Then F(t) = XF(t)= X { F ; (t) + 

A2F2(t)+..+A m F m (t)Which means that the effective 
number serviced in different priority queues. 

Consequently,F(t) =££it F i(t) 

E(t) =/ o a, tdF(C)=2SiX EiWand 

E(t 2 )= JZiJ [E 2 (t) + va ri (t)] 

T q k =t] - t is the waiting time time in the 
queue of a customer of the kth priority queue who 
arrives at t and starts service at ti.Let t& be the 
service times of the customer in the queue 1 through 
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k. During his waiting time T q , other customers may 
arrive in queues 1 through (k-l).Thus if is the time 
to finish the service of the customer already in 
service and because each queue operates on FCFS 
discipline. 

T q k = + XUh + XEtV 

and 

E[T q k ] = W q k 2) 

we have W q k = Efe ] + Sf =1 + 

EKi] = p , w q s 
where p i = \\ E; (t) 

EKi '] = [expected number of arrivals in the ith 
queue during T q k ]x[expected service time per 
customer]^ p \ W q k 



Hence W a 



_E[i;0]+ Ej=iPi Wqi 



where Sk = E;=i pi 
Using induction k, 



EKO] 



we have W a = , 

q (l-S ft _ 1 )(l-S ft j 



and for the k 



queue 



L q k =/l fc W q k andLs k =/l fc W s k 

IV. RESULTS AND DISCUSSIONS 
Model 1 (M/G/l):(GD/oo/ oo) 




Fig 2 Graphical Representation of the Effect of Improving the 
average number of customers (L s ) on arrival Rate of customers(>.) 
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Fig 3 Figure shows arrival Rate of customers(l)versus waiting line 
of the customers in the queue (Lq). 




Fig 4 Graphical Representation of the Effect of Improving the 
arrival rate of Customer's^) versus customers Waiting Time(Wq) 




Fig 5 Graphical Representation of the Effect of Improving the 
Service Rate(Ws) on Customer's arrival TimeM 
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Fig 6 shows the effect of improving customers Waiting Time 
(Wq), average number of customers (L s ), Service Rate(Ws), 
waiting line of the customers in the queue (Lq) versus average 
arrival rate of customers X (lambda) 

Model2:(Mi/Gi /l): (NPRP/qq/qo) 




Fig 7 Graphical Representation of the Effect of Improving the 
average number of customers (L s ) on arrival Rate of customers(A,) 
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Fig 8 shows the effect of average arrival rate of customers X 
(lambda) on the waiting line of the customers in the queue (Lq). 




Fig 9 Graphical Representation of the Effect of the arrival rate of 
Customer' s(A) versus customers Waiting Time(Wq) 




Fig 1 Graphical Representation of the Effect of the Service 
Rate(Ws) on Customer's arrival Time(l) 




Fig 1 1 shows the effect of improving Service Rate(Ws), average 
number of customers (L s ) , waiting line of the customers in the 
queue (Lq). and average arrival rate of customers (X) 

V. CONCLUSION 
From the analysis of the above two queues, 
the feasibility is comparatively admissible in the 
model (Mi /Gi/1): (NPRP/co/oo) based on the 
numerical calculations and graphical representations. 
One can identify the transitions between states. This 
set of equations is solved to get the actual state 
probabilities. This is used to obtain the queue 
performance parameters required. 
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Abstract — Data mining has various applications for customer 
relationship management. In this proposal, I am introducing a 
framework for identifying appropriate data mining techniques 
for various CRM activities. This Research attempts to integrate 
the data mining and CRM models and to propose a new model of 
Data mining for CRM. The new model specifies which types of 
data mining processes are suitable for which stages/processes of 
CRM. In order to develop an integrated model it is important to 
understand the existing Data mining and CRM models. Hence 
the article discusses some of the existing data mining and CRM 
models and finally proposes an integrated model of data mining 
for CRM. 

Keywords-component; formatting; style; styling; insert (key 
words) 

I. Introduction (Heading 1 ) 

Value Creation for the customer is the key determinant of a 
successful business. Customer satisfaction ensures profitability 
for businesses in the long run. Customer bases built over a 
period of time proved to be of immense help in increasing the 
reach of a particular business's product or service. However, 
the recent increase in the operating costs of business made it 
more compelling for businesses to increase loyalty among 
existing customers while trying to attract new ones. The 
processes by which an organization creates value for the 
customer, is often referred to as Customer Relationship 
Management (CRM) [1]. 

According to Microsoft, CRM is "a customer-focused 
business strategy designed to optimize revenue, profitability, 
and customer loyalty. By implementing a CRM strategy, an 
organization can improve the business processes and 
technology solutions around selling, marketing, and servicing 
functions across all customer touch-points (for example: Web, 
e-mail, phone, fax, in-person)". The overall objective of CRM 
applications is to attract, retain and manage a firm's profitable 
("right") customers [1]. 

Business intelligence for CRM applications provides a firm 
with actionable information from the analysis and 
interpretation of vast quantities of customer/market related 
data. Databases for business intelligence include cus-tomer 
demographics, buying histories, cross-sales, service calls, 
website navigation experiences and online transac-tions. 
Through the appropriate use of analytical methods and 
software, a firm is able to turn data into information that leads 



to greater insight and development of fact-based strategies 
which in turn helps the firm gain competitive advantage by 
creating greater value for the customer [1]. 

Analogous to traditional mining, which involves searching for 
an ore in a mountain, data mining involves searching for 
valuable information in large databases. Both these processes 
involve either groping through a vast amount of material or 
intelligently probing the data to find the true value that lies 
hidden in data. Data mining involves not only the extraction of 
previously unknown information from a database but also the 
discovery of relation-ships that did not surface in the previous 
methods of data analysis. The "jewels" discovered from the 
data mining process include these non-intuitive hidden 
predictive relationships between variables that explain 
customer behavior and preferences. The predictive capabilities 
of data mining enable the businesses to make proactive, 
knowledge-driven decisions. Data mining tools facilitate 
prospective analysis, which is an improvement over the 
analysis of past events provided by the retrospective tools. The 
emergence of large data warehouses and the availability of 
data mining software is creating opportunities for businesses 
to find innovative ways to implement effective customer 
relationship strategies [1]. 

The automation of data collection and the relative decrease in 
the costs of operating huge data warehouses has made 
customer data more accessible than ever. The analysis of data, 
which until a few years ago was associated with high-end 
computing power and algorithms decipherable by only 
professional statisticians, is increasing to become more 
popular with user-friendly tools available on desktops [Berger, 
1999 #2]. Data mining plays an important role in the analytical 
phases of the CRM life cycle as well as the CRM process [1]. 



II. Research Methodology 

As the nature of research in CRM and data mining are difficult 
to confine to specific disciplines, the relevant materials are 
scattered across various journals. Business intelligence and 
knowledge discovery are the most common academic 
discipline for data mining research in CRM. Consequently, the 
following online journal databases were searched to provide a 
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comprehensive bibliography of the academic literature on 
CRM and Data Mining: 

_ AB I/INFORM Database; 
_ Academic Search Premier; 
_ Business Source Premier; 
_ Emerald Fulltext; 
_ Ingenta Journals; 
_ Science Direct; and 
_ IEEE Transaction. 

The literature search was based on the descriptor, "customer 
relationship management" and "data mining", which 
originally produced approximately 900 articles. The full text 
of each article was reviewed to eliminate those that were not 
actually related to application of data mining techniques in 
CRM. The selection criteria were as follows: 

_ Only those articles that had been published in business 
intelligence, knowledge discovery or customer management 
related journals were selected, as these were the most 
appropriate outlets for data mining in CRM research and the 
focus of this review. 

Only those articles which clearly described how the 
mentioned data mining technique(s) could be applied and 
assisted in CRM strategies were selected. 

_ Conference papers, masters and doctoral dissertations, 
textbooks and unpublished working papers were excluded, as 
academics and practitioners alike most often use journals to 
acquire information and disseminate new findings. Thus, 
journals represent the highest level of research. Each article 
was carefully reviewed and separately classified according to 
the four categories of CRM dimensions and seven categories 
of data mining models. Although this search was not 
exhaustive, it serves as a comprehensive base for an 
understanding of data mining research in CRM. 

m. Research Challenges 

Data Mining Challenges & Opportunities in CRM 

In this section, we build upon our discussion of CRM and Life 
Sciences to identify key data mining challenges and 
opportunities in these application domains. The following is a 
list of challenges for CRM [2]. 

a. Non-trivial results almost always need a combination of 
DM techniques. Chaining/composition of DM, and more 
generally data analysis, operations is important. In order to 
analyze CRM data, one needs to explore the data from 
different angles and look at its different aspects. This should 
require application of different types of DM techniques and 
their application to different "slices" of data in an interactive 
and iterative fashion. Hence, the need to use various DM 
operators and combine (chain) them into a single "exploration 
plan" [2]. 
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b. There is a strong requirement for data integration 
before data mining. 

In both cases, data comes from multiple sources. For example 
in CRM, data needed may come from different departments of 
an organization. Since many interesting patterns span multiple 
data sources, there is a need to integrate these data before an 
actual data mining exploration can start [2]. 

c. Diverse data types are often encountered. 

This requires the integrated mining of diverse and 
heterogeneous data. In CRM, while dealing with this issue is 
not critical, it is nonetheless important. Customer data comes 
in the form of structured records of different data types (e.g., 
demographic data), temporal data (e.g., weblogs), text (e.g., 
emails, consumer reviews, blogs and chat-room data), 
(sometimes) audio (e.g., recorded phone conversations of 
service reps with customers) [2]. 

d. Highly and unavoidably noisy data must be dealt with. 

In CRM, weblog data has a lot of "noise" (due to crawlers, 
missed hits because of the caching problem, etc.). Other data 
pertaining to customer "touch points" has the usual cleaning 
problems seen in any business-related data [2]. 

e. Privacy and confidentiality considerations for data and 
analysis results 

are a major issue. In CRM, lots of demographic data is highly 
confidential, as are email and phone logs. Concern about 
inference capabilities makes other forms of data sensitive as 
well — e.g., someone can recover personally identifiable 
information (PII) from web logs [2]. 

f. Legal considerations influence what data is available for 
mining and what actions are permissible. 

In some countries it is not allowed to combine data from 
different sources or to use it for purposes different from those 
for which they have been collected. For instance, it may be 
allowed to use an external rating about credit worthiness of a 
customer for credit risk evaluation but not for other purposes. 
Ownership of data can be unclear, depending on the details of 
how and why it was collected, and whether the collecting 
organization changes hands [2]. 

g. Real-world validation of results is essential for 
acceptance. 

In CRM, as in many DM applications, discovered patterns are 
often treated as hypotheses that need to be tested on new data 
using rigorous statistical tests for the actual acceptance of the 
results. This is even more so for taking or recommending 
actions, especially in such high-risk applications as in the 
financial and medical domains. Example: recommending 
investments to customers (it is actually illegal in the US to let 
software give investment advice) [2,3]. 
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h. Developing deeper models of customer behavior: 

One of the key issues in CRM is how to understand customers. 
Current models of customers mainly built based on their 
purchase patterns and click patterns at web sites. Such models 
are very shallow and do not have a deep understanding of 
customers and their individual circumstances. Thus, many 
predictions and actions about customers are wrong. It is 
suggested that information from all customer touch-points be 
considered in building customer models. Marketing and 
psychology researchers should also be involved in this effort. 
Two specific issues need to be considered here. First, what 
level should the customer model be built at, namely at the 
aggregate level, the segment level, or at the individual level? 
The deciding factor is how personalized the CRM effort needs 
to be. Second is the issue of the dimensions to be considered 
in the customer profile. These include demographic, 
psychographic, macro-behavior (buying, etc.), and micro- 
behavior (detailed actions in a store, e.g. individual clicks in 
an online store) features [2,3]. 

i. Acquiring data for deeper understanding in a non- 
intrusive, low-cost, high accuracy manner: 

In many industrial settings, collecting data for CRM is still a 
problem. Some methods are intrusive and costly. Datasets 
collected are very noisy and in different formats and reside in 
different departments of an organization. Solving these pre- 
requisite problems is essential for data mining applications [2]. 

j. Managing the "cold start/bootstrap" problem: 

At the beginning of the customer life cycle little is known, but 
the list of customers and the amount of information known for 
each customer increases over time. In most cases, a minimum 
amount of information is required for achieving acceptable 
results (for instance, product recommendations computed 
through collaborative filtering require a purchasing history of 
the customer). Being able to deal with cases where less than 
this required minimum is known is a therefore a major 
challenge [2]. 

k. Evaluation framework for distinguishing between 
correct/incorrect customer understanding: 

Apart from the difficulty of building customer models, 
evaluating them is also a major task. There is still no 
satisfactory metric that can tell whether one model is better 
than another and whether a model really reflects customer 
behaviors. Although there are some metrics for measuring 
quality of customer models (e.g., there are several metrics for 
measuring the quality of recommendations), they are quite 
rudimentary, and there is a strong need to work on better 
measures. Specifically, the recommender systems community 
has explored this area [2,3,6,7]. 




Figure 1 . Classification framework for data mining techniques in CRM. 

In the previous figure we can see how Data mining stages used 
with all CRM lifecycle. 

(1) Association rule; 

(2) Decision tree; 

(3) Genetic algorithm; 

(4) Neural networks; 

(5) K-Nearest neighbour; 

(6) Linear/logistic regression. 

A graphical classification framework on data mining 
techniques in CRM is proposed and shown in Fig. 1; it is 
based on a review of the literature on data mining techniques 
in CRM. Critically reviewing the literature on data mining in 
CRM helped to identify the major CRM dimensions and data 
mining techniques for the application of data mining 
techniques in CRM. It describes CRM dimensions as: 
Customer Identification, Customer Attraction, Customer 
Retention and Customer Development. In addition, described 
the types of data mining model as Association, Classification, 
Clustering, Forecasting, Regression, Sequence Discovery and 
Visualization. We provide a brief description of these four 
dimensions and some references for further details, and each 
of them is discussed in the following sections [6]. 

IV. Classification Framework - CRM Dimensions 

In this study, CRM is defined as helping organizations to 
better discriminate and more effectively allocate resources to 
the most profitable group of customers through the cycle of 
customer identification, customer attraction, customer 
retention and customer. 

(i) Customer identification: CRM begins with customer 
identification, which is referred to as customer 
acquisition in some articles. This phase involves 
targeting the population who are most likely to 
become customers or most profitable to the 
company. Moreover, it involves analyzing 
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customers who are being lost to the competition 
and how they can be won back . Elements for 
customer identification include target customer 
analysis and customer segmentation. Target 
customer analysis involves seeking the profitable 
segments of customers through analysis of 
customers' underlying characteristics, whereas 
customer segmentation involves the subdivision 
of an entire customer base into smaller customer 
groups or segments, consisting of customers who 
are relatively similar within each specific 
segment . 

(ii) Customer attraction: This is the phase following 

customer identification. After identifying the 
segments of potential customers, organizations 
can direct effort and resources into attracting the 
target customer segments. An element of 
customer attraction is direct marketing. Direct 
marketing is a promotion process which 
motivates customers to place orders through 
various channels. For instance, direct mail or 
coupon distribution are typical examples of 
direct marketing. 

(iii) Customer retention: This is the central concern for 

CRM. Customer satisfaction, which refers to the 
comparison of customers' expectations with his 
or her perception of being satisfied, is the 
essential condition for retaining customers . As 
such, elements of customer retention include 
one-to-one marketing, loyalty programs and 
complaints management. One-to-one marketing 
refers to personalized marketing campaigns 
which are supported by analysing, detecting and 
predicting changes in customer behaviours Thus, 
customer profiling, recommender systems or 
replenishment systems are related to one-to-one 
marketing. Loyalty programs involve campaigns 
or supporting activities which aim at maintaining 
a long term relationship with customers. 
Specifically, churn analysis, credit scoring, 
service quality or satisfaction form part of 
loyalty programs. 

(iv) Customer development: This involves consistent 

expansion of transaction intensity, transaction 
value and individual customer profitability. 
Elements of customer development include 
customer lifetime value analysis, up/cross selling 
and market basket analysis. Customer lifetime 
value analysis is defined as the prediction of the 
total net income a company can expect from a 
customer. Up/Cross selling refers to promotion 
activities which aim at augmenting the number 
of associated or closely related services that a 
customer uses within a firm. Market basket 
analysis aims at maximizing the customer 
transaction intensity and value by revealing 
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a. Good actioning mechanisms: 

Once data mining has been conducted with promising results, 
how to use them in the daily performance task is critical and it 
requires significant research effort. It is common that after 
some data results are obtained, the domain users do not know 
how to use them in their daily work. This research may require 
the participation of business and marketing researchers. 
Another way to accommodate actioning mechanisms is to 
integrate them into the knowledge discovery process by 
focusing on the discoveries of actionable patterns in customer 
data. This would make easier for the marketers or other 
domain experts to determine which actions should be taken 
once the customer patterns are discovered [2]. 

b. Incorporating prior knowledge: 

This has always been a problem in practice. Data mining tends 
to find many pieces of patterns that are already known or 
redundant. Incorporating prior domain knowledge can help to 
solve these problems, and also to discover something novel. 
However, the difficulties of incorporating domain knowledge 
result in little progress in the past. There are a number of 
reasons for this. First of all, knowledge acquisition from 
domain experts is very hard. This is well documented in AI 
research, especially in the literature of expert systems 
building. Domain experts may know a lot but are unable to 
tell. Also, many times, domain experts are not sure what the 
relevant domain knowledge is, which can be very wide, 
although the data mining application itself is very narrow. 
Only after domain experts have seen some discovered patterns 
then they remember some domain knowledge. The second 
reason is the algorithmic issue. Many existing methods have 
difficulty to incorporate sophisticated domain knowledge in 
the mining algorithm. Also, once the new patterns are 
discovered, it is important to develop methods that integrate 
the newly discovered knowledge with the previous knowledge 
thus enhancing the overall knowledge base. Although there is 
some general work on knowledge enhancement, much more 
needs to be done to advance this area and adapt it to CRM 
problems. Also, integration of these methods with existing and 
novel Knowledge Management approaches constitutes a 
fruitful area of research [2]. 

Customer relationship management in its broadest sense 
simply means managing all customer interactions. In practice, 
this requires using information about your customers and 
prospects to more effectively interact with your customers in 
all stages of your relationship with them. We refer to these 
stages as the customer life cycle. 

The customer life cycle has three stages: 

1 . Acquiring customers 

2. Increasing the value of customers 

3. Retaining good customers 
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Data mining can improve your profitability in each of these 
stages when you integrate it with operational CRM systems or 
implement it as independent applications [4]. 

c. Acquiring new customers via data mining [4] 

The first step in CRM is to identify prospects and convert 
them to customers. Let's 

look at how data mining can help manage the costs and 

improve the effectiveness of 

a customer acquisition campaign. 

Big Bank and Credit Card Company (BB&CC) annually 
conducts 25 direct mail campaigns, each of which offers one 
million people the opportunity to apply for a credit card. The 
conversion rate measures the proportion of people who 
become credit card customers, which is about one percent per 
campaign for BB&CC. 

Getting people to fill out an application for the credit card is 
only the first step. Then, BB&CC must decide if the applicant 
is a good risk and accept them as a customer or decline the 
application. Not surprisingly, poor credit risks are more likely 
to accept the offer than are good credit risks. So while six 
percent of the people on the mailing list respond with an 
application, only about 16 percent of those are suitable credit 
risks; approximately one percent of the people on the mailing 
list become customers. 

BB&CC s six percent response rate means that only 60,000 
people out of one million names respond to the solicitation. 
Unless BB&CC changes the nature of the solicitation - using 
different mailing lists, reaching customers in different ways, 
altering the terms of the offer it is not going to receive more 
than 60,000 responses. And of those 60,000 responses, only 
10,000 are good enough risks to become customers. The 
challenge BB&CC faces is reaching those 10,000 people most 
efficiently. 

BB&CC spends about $1.00 per piece, for a total cost of 
$1,000,000, to mail the solicitation. Over the next couple of 
years, the customers gained through this solicitation generate 
approximately $1,250,000 in profit for the bank (or about 
$125 each), for a net return of $250,000 from the mailing. 

Data mining can improve this return. Although data mining 
won't precisely identify the 

10,000 eventual credit card customers, data mining helps focus 
marketing efforts much 
more cost effectively. 

First, BB&CC sent a test mailing of 50,000 prospects and 
carefully analyzed the results, 

building a predictive model showing who would respond 
(using a decision tree) and a 
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credit scoring model (using a neural net). BB&CC then 
combined these two models to find the people who were both 
good credit risks and were most likely to respond to the offer. 

BB&CC applied the model to the remaining 950,000 people in 
the mailing list, from which 700,000 people were selected for 
the mailing. What was the result? From the 750,000 pieces 
mailed (including the test mailing), BB&CC received 9,000 
acceptable applications for credit cards. In other words, the 
response rate rose from one percent to 1.2 percent, a 20 
percent increase. While the targeted mailing only reaches 
9,000 of the 10,000 prospects - no model is perfect - reaching 
the remaining 1,000 prospects is not profitable. Had they 
mailed the other 250,000 people on the mailing list, the cost of 
$250,000 would have resulted in another $125,000 of gross 
profit for a net loss of $125,000. 

Notice that the net profit from the mailing increased $125,000. 
Even when you include the $40,000 cost of the data mining 
software and the computer and employee resources used for 
this modeling effort, the net profit increased $85,000. This 
translates to a return on investment (ROI) for modeling of over 
200 percent, which far exceeds BB&CC s ROI requirements 
for a project. 



d. Increasing the value of your existing customers [4] 

Cannons and Carnations (C&C) is a company that specializes 
in selling antique mortars and cannons as outdoor flower pots. 
It also offers a line of indoor flower pots made from large 
caliber antique pistols and a collection of muskets that have 
been converted to unique holders of long-stemmed flowers. 
The C&C catalog is sent to about 12 million homes. 

When a customer calls C&C to place an order, C&C identifies 
the caller using caller ID when possible; otherwise the C&C 
representative asks for a phone number or customer number 
from the catalog mailing label. Next, the representative looks 
up the customer in the database and then proceeds to take the 
order. 

C&C has an excellent chance of cross-selling, or selling the 
caller something additional. But C&C discovered that if the 
first suggestion fails and the representative suggests a second 
item, the customer might get irritated and hang up without 
ordering anything. And, there are some customers who resent 
any cross-selling attempts. 

Before implementing data mining, C&C was reluctant to 
cross-sell. Without a model, the odds of making the right 
recommendation were one in three. And, because making any 
recommendation is unacceptable for some customers, C&C 
wanted to be extremely sure that it never makes a 
recommendation when it should not. In a trial campaign, C&C 
had less than a one percent sales rate and received a 
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substantial number of complaints. C&C was reluctant to 
continue cross-selling for such a small gain. 

The situation changed dramatically once C&C used data 
mining. Now the data mining model operates on the data. 
Using the customer information in the database and the new 
order, it tells the customer service representative what to 
recommend. C&C successfully sold an additional product to 
two percent of the customers and experienced virtually no 
complaints. 

Developing this capability involved a process similar to what 
was used to solve the credit card customer acquisition 
problem. As with that situation, two models were needed. 

The first model predicted if someone would be offended by 
additional product recommendations. C&C learned how its 
customers reacted by conducting a very short telephone 
survey. To be conservative, C&C counted anyone who 
declined to participate in the survey as someone who would 
find recommendations intrusive. Later on, to verify this 
assumption, C&C made recommendations to a small but 
statistically significant subset of those who had refused to 
answer the survey questions. To C&C's surprise, it discovered 
that the assumption was not warranted. This enabled C&C to 
make more recommendations and further increase profits. The 
second model predicted which offer would be most 
acceptable. 

In summary, data mining helped C&C better understand its 
customers' needs. When the data mining models were 
incorporated in a typical cross-selling CRM campaign, the 
models helped C&C increase its profitability by two percent. 
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complementing the item under consideration. In particular, the 
site can take into account not only the item you're looking at, 
but what is in your shopping cart as well, thus leading to even 
more customized recommendations. 

First, Big Sam's used clustering to discover which products 
grouped together naturally. Some of the clusters were obvious, 
such as shirts and pants. Others were surprising, such as books 
about desert hiking and snakebite kits. They used these 
groupings to make recommendations whenever someone 
looked at a product. 

Big Sam's then built a customer profile to help identify 
customers who would be interested in the new products that 
were frequently added to the catalog. Big Sam's learned that 
steering people to these selected products not only resulted in 
significant incremental sales, but also solidified its customer 
relationships. Surveys established that Big Sam's was viewed 
as a trusted advisor for clothing and gear. 

To extend its reach further, Big Sam's implemented a program 
through which customers could elect to receive e-mail about 
new products that the data mining models predicted would 
interest them. While the customers viewed this as another 
example of proactive customer service, Big Sam's discovered 
it was a program of profit improvement. 

The personalization effort paid off for Big Sam's, which 
experienced significant, measurable increases in repeat sales, 
average number of sales per customer and average size of 
sales. 

f. Retaining good customers via data mining [4] 



e. Increasing the value of your existing customers: [4] 
personalization via data mining 

Big Sam's Clothing (motto: "Rugged outdoor gear for city 
dwellers") developed a Web site to supplement its catalog. 
Whenever you enter Big Sam's site, the site greets you by 
displaying "Howdy Pardner!" However, once you have 
ordered or registered with Big Sam's, you are greeted by 
name. If you have a Big Sam's ordering record, Big Sam's 
will also tell you about any new products that might be of 
particular interest to you. When you look at a particular 
product, such as a waterproof parka, Big Sam's suggests other 
items that might supplement such a purchase. 

When Big Sam's first launched its site, there was no 
personalization. The site was just an online version of its 
catalog nicely and efficiently done but it didn't take 
advantage of the sales opportunities the Web presents. 

Data mining greatly increased Big Sam's Web site sales. 
Catalogs frequently group products by type to simplify the 
user's task of selecting products. In an online store, however, 
the product groups may be quite different, often based on 



For almost every company, the cost of acquiring a new 
customer exceeds the cost of keeping good customers. This 
was the challenge facing KnowService, an Internet Service 
Provider (ISP) who experiences the industry-average attrition 
rate, eight percent per month. Since KnowService has one 
million customers, this means 80,000 customers leave each 
month. The cost to replace these customers is $200 each or 
$16,000,000 - plenty of incentive to start an attrition 
management program. 

The first thing KnowService needed to do was prepare the data 
used to predict which customers would leave. KnowService 
needed to select the variables from its customer database and, 
perhaps, transform them. The bulk of KnowService' s users are 
dial-in clients (as opposed to clients who are always connected 
through a Tl or DSL line) so KnowService knows how long 
each user was connected to the Web. KnowService also knows 
the volume of data transferred to and from a user's computer, 
the number of e mail accounts a user has, the number of e-mail 
messages sent and received along with the customer's service 
and billing history. In addition, KnowService has demographic 
data that customers provided at sign-up. 
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Next, KnowService needed to identify who were "good" 
customers. This is not a data mining question but a business 
definition (such as profitability or lifetime value) followed by 
a calculation. KnowService built a model to profile its 
profitable customers and unprofitable customers. 
KnowService used this model not only for customer retention 
but to identify customers who were not yet profitable but 
might become so in the future. 
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appropriate offer. The net result was a reduction in 
KnowService' s churn rate from eight percent to 7.5 percent, 
which allowed KnowService to save $1,000,000 per month in 
customer acquisition costs. 

KnowService discovered that its data mining investment paid 
off - it improved customer relationships and dramatically 
increased its profitability. 



KnowService then built a model to predict which of its 
profitable customers would leave. As in most data mining 
problems, determining what data to use and how to combine 
existing data is much of the challenge in model development. 
For example, KnowService needed to look at time-series data 
such as the monthly usage. Rather than using the raw 
timeseries data, it smoothed the data by taking rolling three- 
month averages. KnowService also calculated the change in 
the three-month average and tried that as a predictor. Some of 
the factors that were good predictors, such as declining usage, 
were symptoms rather than causes that could be directly 
addressed. Other predictors, such as the average number of 
service calls and the change in the average number of service 
calls, were indicative of customer satisfaction problems worth 
investigating. 

Predicting who would churn, however, wasn't enough. Based 
on the results of the modeling, KnowService identified some 
potential programs and offers that it believed would entice 
people to stay. For example, some churners were exceeding 
even the largest amount of usage available for a fixed fee and 
were paying substantial incremental usage fees. KnowService 
offered these users a higher-fee service that included more 
bundled time. Some users were offered more free disk space to 
store personal Web pages. KnowService then built models that 
would predict which would be the most effective offer for a 
particular user. 

To summarize, the churn project made use of three models. 
One model identified likely churners, the next model picked 
the profitable potential churners worth keeping and the third 
model matched the potential churners with the most 



V. CONCOLUSION 

Customer relationship management is essential to compete 
effectively in today's marketplace. The more effectively you 
can use information about your customers to meet their needs, 
the more profitable you will be. We can conclude that 
operational CRM needs analytical CRM with predictive data 
mining models at its core. The route to a successful business 
requires that you understand your customers and their 
requirements, and data mining is the essential guide [4]. 
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Abstract- In computer networking, the Media Access Control (MAC) address is a unique value associated with a network 
adapter. MAC addresses are also known as hardware addresses or physical addresses. TCP/IP and other mainstream 
networking architectures generally adopt the OSI model. MAC addresses function at the data link layer (layer 2 in the OSI 
model). They allow computers to uniquely identify themselves on a network at this relatively low level. In this paper, suggested 
data encryption technique is presented by using the MAC address as a key that is used to authenticate the receiver device like 
PC, mobile phone, laptop or any other devices that is connected to the network. This technique was tested on some data, visual 
and numerical measurements were used to check the strength and performance of the technique. The experiments showed 
that the suggested technique can be used easily to encrypt data that is transmitted through networks. 
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I. Introduction 

Because of greater demand in digital signal 
transmission in recent time, the problem of illegal data 
access from unauthorized persons becomes need 
intelligent and quick solution. Accordingly, the data 
security has become a critical and imperative issue in 
multimedia data transmission applications. In order to 
protect valuable information from undesirable users or 
against illegal reproduction and modifications, various 
types of cryptographic/encryption schemes are needed. 
Cryptography offers efficient solutions to protect 
sensitive information in a large number of applications 
including personal data security, medical records, 
network security, internet security, diplomatic and 
military communications security, etc. through the 
processes of encryption/decryption. 

Cryptography contains two basic processes: one 
process is when recognizable data, called plain data, is 
transformed into an unrecognizable form, called cipher 
data. To transform data in this way is called to encipher 
the data or encryption. The second process is when the 
cipher data is transformed back to the original plain data, 
this is called to decipher, or decrypting the data. To be 
able to determine if a user is allowed to access 
information a key is often used. Once a key has been 
used to encipher information, only someone who knows 
the correct key can decipher the encrypted data. The key 
is the foundation of most data encryptions algorithms 
today. A good encryption algorithm should still be secure 
even if the algorithm is known [1-5]. 



Encryption is the process of transforming the 
information to insure its security. With the huge growth 
of computer networks and the latest advances in digital 
technologies, a huge amount of digital data is being 
exchanged over various types of networks. It is often true 
that a large part of this information is either confidential 
or private. As a result, different security techniques have 
been used to provide the required protection [6]. 

MAC addresses are 12-digit hexadecimal numbers 
(48 bits in length). By convention, MAC addresses are 
usually written in one of the following two formats: 



MM:MM:MM:SS:SS:SS or MM-MM-MM-SS-SS-SS 



The first half of a MAC address contains the ID 
number of the adapter manufacturer. These IDs are 
regulated by an Internet standards body (see sidebar). 
The second half of a MAC address represents the serial 
number assigned to the adapter by the manufacturer. In 
the example: (00:A0:C9:14:C8:29), The prefix 00A0C9 
indicates the manufacturer is Intel Corporation.MAC 
address spoofing is a synonym for taking over the 
identity of network interface controllers (NIC). Every 
single networking device is equipped with a globally 
unique hardware address called MAC address. The 
uniqueness of MAC addresses is essential in all phases of 
network communication because they map all upper- 
layer identifiers, e.g. IP addresses, to particular network 
interfaces[7]. 
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Most difficult thing in security area is to assure the 
entity's identity. In many cases, string based pass phrases 
are used for this purpose. However, this kind of pass 
phrases is easily sniffed by a sophisticatedly architected 
key logging malware. Because of the reason, the highly 
security-sensitive services such as online finance and 
government issues are trying to restrict their operational 
environment in different ways. As an emphatic example 
of such approaches, Korean government has introduced 
the designated platform policy for online banking 
services where users are requested to use several limited 
number of PCs for renewing their public certificates. The 
problem in this approach is how to prove that the PC 
currently in use is one of the designated set of PCs. For 
this policy to be successful, it is essential to achieve the 
uniqueness of a PC platform to register it as a designated 
one [8]. 

The uniqueness of a hardware platform can be 
achieved by deriving platform-unique information from 
one or a combination of several hardware-dependent 
unique values. The Ethernet MAC address is considered 
the best one of such reasonable candidates as network IP 
address, serial numbers of hard disks, identifiers and 
mapping addresses of periphery devices and etc. because 
many people believe it is an un-modifiable and globally 
unique hardwired value. Although a MAC address needs 
to be unique only in a network segment, manufactures 
produce the Ethernet card with a pseudo globally-unique 
MAC address to eliminate the address conflicts when 
multiple cards are randomly deployed. In case of the 
designated platform solution [9], it utilizes the MAC 
address as a factor for constructing the platform-unique 
information. Therefore, this solution is somewhat for a 
kind of multi-factor authentication because the platform- 
unique information is used as an additional factor in 
proving both of the user identifier and the platform 
identifier. This approach can improve the security level 
of the services such as the online games, file repositories, 
financial transactions, etc. by restricting the locations 
(authenticated platforms) of the authenticated users. 
When a specific service is used, it registers the platform 
identifier to the management server. When the user tries 
to use this service back, the trials issued only on the 
platforms of which identifiers have registered are allowed 
and other requests are denied. 

Practically, the MAC address is considered it should 
not be changed in an active service. Regarding the 
wireless gateway, it allows only the registered specific 
MAC addresses for network connections if configured 
especially in the case a mobile or vehicle machine makes 
an inquiry into a location in the dedicated network [10, 
11]. 

With the advancements of multimedia and networks 
technologies, a vast number of digital images, video and 
other types of files now transmitted over Internet and 
through wireless networks for convenient accessing and 
sharing [5]. Multimedia security in general is provided by 



a method or a set of methods used to protect the 
multimedia content. These methods are heavily based on 
cryptography and they enable either communication 
security, or security against piracy (Digital Rights 
Management and watermarking), or both. 
Communication security of digital images and textual 
digital media can be accomplished by means of standard 
symmetric key cryptography. Such media can be treated 
as binary sequence and the whole data can be encrypted 
using a cryptosystem such as Advanced Encryption 
Standard (AES) or Data Encryption Standard (DES) [12]. 
In general, when the multimedia data is static (not a real- 
time streaming) it can treated as a regular binary data and 
the conventional encryption techniques can be used. 
Deciding upon what level of security is needed is harder 
than it looks. To identify an optimal security level, the 
cost of the multimedia information to be protected and 
the cost of the protection itself are to be compared 
carefully. 

As a result, protection of digital images against illegal 
copying and distribution has become an important issue 
[5, 13, 14, 15]. 

There have been various data encryption techniques 
[16, 17, 18] on multimedia data proposed in the literature. 
Genetic Algorithms are among such techniques. The 
genetic algorithm is a search algorithm based on the 
mechanics of natural selection and natural genetics. 

The genetic algorithm uses two reproduction 
operators: crossover and mutation. Reproduction give 
genetic algorithms most of their searching power. To 
apply a crossover operator, parents are paired together. 
There are several different types of crossover operators, 
and the types available depend on what representation is 
used for the individuals. The one-point crossover means 
that the parent individuals exchange a random prefix 
when creating the child individuals. The purpose of the 
mutation operator is to simulate the effect of transcription 
errors that can happen with a very low probability when a 
chromosome is mutated. 

Only few genetic algorithms based encryption have 
been proposed. Kumar and Rajpal described encryption 
using the concept of the crossover operator and 
pseudorandom sequence generator by NLFFSR 
(Nonlinear Feed Forward Shift Register). The crossover 
point is decided by the pseudorandom sequence and the 
fully encrypted data they are able to achieve [19]. Kumar, 
Rajpal, and Tayal extended this work and used the 
concept of mutation after encryption. Encrypted data are 
further hidden inside the stego-image [20]. 

Husainy proposed Image Encryption using Genetic 
Algorithm-based Image Encryption using mutation and 
crossover concept [21]. 

A. Tragha et al., describe a new symmetrical block 
ciphering system named ICIGA (Improved Cryptography 
Inspired by Genetic Algorithms) which generates a 
session key in a random process. The block sizes and the 
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key lengths are variable and can be fixed by the user at 
the beginning of ciphering. ICIGA is an enhancement of 
the system (GIC) "Genetic algorithms Inspired 
Cryptography" [22]. 

A different technique for secure and efficient data 
encryption has been presented in this paper. This 
technique employs the MAC address of the receiver as a 
key to encrypt data, crossover and mutation operations of 
the Genetic Algorithm (GA) are using here with the 
MAC address to produce a strong encrypted data that 
have a good immunity against the attackers. 

II.Materials and methods 

Whenever the sender want send a safe data to the 
receiver, and after establishing the communication 
session. The MAC address of the receiver's device is read 
by the encryption technique to use it as a key to encrypt 
the data. The six parts of the MAC address will be 
formed to represent a vector (chromosome) of 6 bytes 
(genes). For example: (00:A0:C9:14:C8:29) is represent 
(in decimal) as: 



160 201 20 200 41 



Digital data file will be treated as a set of N bytes. 
The data is read and splitting it into a set of (N/6) vectors 
(chromosomes) of 6 bytes (same length as the MAC 
address above). For example, if the source data file has 
24 bytes as: 



2 


10 


7 


15 


32 


19 


9 


64 


71 


3 


15 


23 


1 


12 


34 


18 


5 


25 


30 


11 


3 


16 


27 


8 



Then these bytes are represented as: 
Vector # 



1 


2 


10 


7 


15 


32 


19 






2 


9 


64 


71 


3 


15 


23 






3 


1 


12 


34 


18 


5 


25 






4 


30 


11 


3 


16 


27 


8 



Now, the technique performs three main operations 
on the data vectors above: 



Vector # 



32 


19 


2 


7 


15 


10 




23 


71 


64 


9 


3 


15 




25 


18 


34 


5 


1 


12 




27 


3 


30 


8 


11 


16 



2. Mutation or substitution the value of each gene in 
each vector. This is done by apply an eXclusive-OR 
(XOR) Boolean operation between the MAC 
address vector and each of the data vector. After 
finish this operation the data vectors become as 
follow: 

Vector # 



32 


179 


203 


19 


199 


35 




23 


231 


137 


29 


203 


38 




25 


178 


255 


17 


201 


37 




27 


163 


215 


28 


195 


57 



3. Re-sequence or reorder the sequence of the vectors 
itself. To make extra distortion in the encrypted data 
vector, the technique reorders the sequence of the 
data vectors randomly to be as follow: 

Vector # 



25 


178 


255 


17 


201 


37 




27 


163 


215 


28 


195 


57 




32 


179 


203 


19 


199 


35 




23 


231 


137 


29 


203 


38 



When the three main operations are completed, the 
technique produces the encrypted data file which is being 
as follow: 



25 


178 


255 


17 


201 


37 


27 


163 


215 


28 


195 


57 


32 


179 


203 


19 


199 


35 


23 


231 


137 


29 


203 


38 



1. Crossover or transposition the gene order in each Xo ensure that the encryption technique really will 

vector. This is done by using a pseudo random happen enough distortion in the source data, the 

number generation algorithm with different seed measurement of Signal to Noise Ratio (SNR) can be used 

(initial) value for each vector (the vector number in here xhe SNR is calculated by using the following 
this work). After doing this operation the data 
vectors become as follow: 
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formula, where S and E represent the source and the 
encrypted image respectively: 



SNR 



(1) 

For the numerical example above, SNR between the 
source and the encrypted data = (1.200 db). This ratio is 
enough to make a good protection to the source data 
against the attackers. 

III.Results and Discussion 

To give reader an ability to note the performance of 
the suggested technique, an experiment is implemented 
on an image of type (.bmp) to see strength of the 
encryption technique visually. The required programming 
codes to implement the proposed method are written 
using JAVA programming language. 

Key space analysis, key sensitivity analysis, statistical 
analysis and Signal to Noise Ratio (SNR) are some of the 
security tests that are recommended to be used for testing 
the performance, strength and immunity of encryption 
methods. 

A. Key space analysis 

In any effective encryption system, the key space 
should be large enough to make brute-force attack 
infeasible. The secret key space (MAC address) in the 
suggested technique is (6bytes = 48bits), this means that 
the encryption system has relatively enough number of 
bits in the secret key. In this work, we note that the bits in 
the key are restricted by the MAC address and they 
cannot be increased or decreased. 

B. Key sensitivity 

To evaluate the key sensitivity feature of the proposed 
technique, a one bit change is made the secret key (MAC 
address) and then used it to decrypt the encrypted image. 
The decrypted image with the wrong key is completely 
different when it is compared with the decrypted image 
by using the correct key as shown in Fig. 1. It is the 
conclusion that the proposed encryption technique is 
highly sensitive to the key, even an almost perfect guess 
of the key does not reveal any information about the plain 
image/data. 




(a) (b) (c) 

Figure 1: (a) Source image (b) Encrypted image (c) Decrypted 
image with wrong key 



C. Statistical analysis 

Statistical attack is a commonly used method in 
cryptanalysis and hence an effective encryption system 
should be robust against any statistical attack. Calculating 
the histogram and the correlation between the neighbors 
pixels in the source and in the encrypted image are the 
statistical analysis to prove the strong of the proposed 
encryption system against any statistical attack. 

Fig. 2 shows the histograms of the source image in 
Figure 1 and its encrypted image respectively. It's clear 
from Fig. 2 that the histogram of the encrypted image is 
completely different from the histogram of the source 
image and does not provide any useful information to 
employ statistical attack. 




(b) 

Figure 2: (a) Histogram of the source image in Fig. 1(a) 
(b) Histogram of its encrypted image 

The correlation coefficient r is calculating by using 
the following formula: 

r= li=i( x i-x)(yj-y) 



Where N is the number of pixel pairs, 



JY 

i=i 



(2) 



(3) 



And 

M 

t=i (4) 

The correlation coefficient for horizontal neighbor 
pixels of the source image in Fig. 1 is r=0. 100231 while 
r=0.005561 for its encrypted image. It is clear from these 
two different values of the correlation coefficient that the 
strong correlation between neighbor pixels in source 
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image is greatly reduced in the encrypted image. The 
results of the correlation coefficient for vertical and 
diagonal neighbor pixels are similar to the horizontal 
neighbor pixels. The Signal to Noise Ratio (SNR) that is 
calculated for the above encrypted image in Fig. 1(b) is 
SNR=4.04401. 

Some other images and data files were tested; the 
same behavior in the suggested encryption technique has 
been recorded. 

From the above test, we note the following points: 

The low value of SNR refer to that there is much 
distortion in the encrypted image. This means that the 
encrypted image has good immunity against the Human 
Visual System (HVS) attack. 

The value of the correlation coefficient of the 
encrypted image is reducing heavily. And it is minimized 
greatly when comparing its value with the value of the 
correlation coefficient of the source image. 

IV.CONCLUSIONS 

In this paper, a technique for data encryption has been 
presented which employ the MAC address of the receiver 
device to use it as a key for encryption. This technique 
made a good immunity for the data that is transmitted 
through networks. The visual and analytical tests showed 
that the suggested technique is useful to use in the field of 
image/data encryption effectively in networks. 
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ABSr/fACr-Retinal exudates classification and 
identification of diabetic retinopathy to diagnose the eyes 
using fundus images requires automation. This research 
work proposes retinal exudates classification. 
Representative features are obtained from the fundus 
images using segmentation method. Fuzzy logic and back 
propagation algorithm are trained to identify the presence 
of exudates in fundus image. The presence of exudates is 
identified more clearly using Fuzzy logic and back 
propagation algorithm. By knowing the outputs of proposed 
algorithm during testing, accurate diagnosis and 
prescription for treatment of the affected eyes can be done. 
Fifty fundus images are used for testing. The performance 
of proposed algorithm is 96 %(48 images are classified). 
Simulation results show the effectiveness of proposed 
algorithm in retinopathy classification. Very large database 
can be created from the fundus images collected from the 
diabetic retinopathy patients that can be used for future 
work 

Keywords: Diabetic retinopathy; fundus image; 
exudates detection; Fuzzy logic; back propagation 
algorithm. 

I. INTRODUCTION 

Diabetic Retinopathy (DR) cause blindness[l]. 
The prevalence of retinopathy varies with the age of 
onset of diabetes and the duration of the disease. 
Color fundus images are used by ophthalmologists to 
study eye diseases like diabetic retinopathy [2]. Big 
blood clots called hemorrhages are found. Hard 
exudates are yellow lipid deposits which appear as 



bright yellow lesions. The bright circular region from 
where the blood vessels emanate is called the optic 
disk. The fovea defines the center of the retina, and is 
the region of highest visual acuity. The spatial 
distribution of exudates and microaneurysms and 
hemorrhages [3], especially in relation to the fovea 
can be used to determine the severity of diabetic 
retinopathy 

Hard exudates are shinny and yellowish 
intraretinal protein deposits, irregular shaped, and 
found in the posterior pole of the fundus [4]. Hard 
exudates may be observed in several retinal vascular 
pathologies. Diabetic macular edema is the main 
cause of visual impairment in diabetic patients. 
Exudates are well contrasted with respect to the 
background that surrounds them and their shape and 
size vary considerably [5]. Hard and soft exudates can 
be distinguished because of their color and the 
sharpness of their borders. Various methods have 
been reported for the detection of Exudates. Efficient 
algorithms for the detection of the optic disc and 
retinal exudates have been presented in [6] [7]. 

Thresholding and region growing methods were 
used to detect exudates[8][9], use a median filter to 
remove noise, segment bright lesions and dark 
lesions by thresholding, perform region growing, 
then identify exudates regions with Bayesian, 
Mahalanobis, and nearest neighbor (NN) classifiers. 
Recursive region growing segmentation 
(RRGS).[10], have been used for an automated 
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detection of diabetic retinopathy Adaptive intensity 
thresholding and combination of RRGS were used to 
detect exudates, [1 1-17], combine color and sharp 
edge features to detect exudate. First they find 
yellowish objects, then they find sharp edges using 
various rotated versions of Kirsch masks on the green 
component of the original image. Yellowish objects 
with sharp edges are classified as exudates. 
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II MATERIALS AND METHODS 

This research work proposes Fuzzy logic and 
back propagation algorithm (BPA) for identifying the 
defect in the diabetic retinopathy image. 
Segmentation is used for feature extraction. The 
extracted features are input to the Fuzzy logic and the 
output is input to the BPA network. In order to 
achieve maximum percentage of identification of the 
exudates, proper data is input for Fuzzy logic, 
optimum topology of BPA and correct training of 
BPA with suitable parameters is a must. 

A large amount of exudates and non exudates 
images are collected. Features are extracted from the 
images using segmentation. The features are input to 
the Fuzzy logic. The outputs of Fuzzy logic are given 
in the input layer of BPA. Labeling is given in the 
output layer of BPA. The labeling indicates the 
exudates. The final weights obtained after training 
the Fuzzy logic and BPA is used to identify the 
exudates. Figure 1 explains the overall sequence of 
proposed methodology. 

A.FUZZY LOGIC 

Fuzzy Logic (FL) is a multi-valued 
logic that allows intermediate values to be defined 
between conventional evaluations like true/false, 



yes/no, high/low. Fuzzy systems are an alternative to 
traditional notions of set membership and logic. 

The training (Figure 2) and testing (Figure 3) 
fuzzy logic is to map the input pattern with target 
output data. For this the inbuilt function has to 
prepare membership table and finally a set of number 
is stored. During testing, the membership function is 
used to test the pattern. 
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B. BACK PROPAGATION ALGORITHM 

Artificial neural network algorithm is used to 
estimate the exudates of retinopathy image. The ANN 
is trained using 3 values in the input layer two values 
in the output layer: stress and strain that are to be 
estimated during the testing stage of ANN algorithms. 
The number of nodes in the hidden layer varies 
depending upon the weight updating equations. Exact 
number of nodes, is fixed based on the trial and error 
method, in which the accuracy of estimation by the 
BPA is used as the criteria for the performance of 
ANN algorithm. The training of patterns used for the 
ANN are chosen from the segmented features. During 
the training process, depending upon the type of 
values present in the patterns, the learning capability 
of the ANN algorithms varies. 

The concept of steepest-descent method is used in 
BPA to reach a global minimum. The number of 
layers is decided initially. The numbers of nodes in 
the hidden layers are decided. 

It uses all the 3 layers (input, hidden and output). 
Input layer uses 3 nodes, hidden layer has 2 nodes and 
the output layer includes two nodes. 

Random weights are used for the connections 
between nodes. Error at the output layer of the 
network is calculated by presenting a pattern to the 
input layer of the network. Weights are updated 
between the layers by propagating the error 
backwards till the input layer. All the training patterns 
are presented to the network for learning. This forms 
one-iteration. At the end of iteration, test patterns are 
presented to ANN and the prediction performance of 
ANN is evaluated. Further training of ANN is 
continued till the desired prediction performance is 
reached. 

The concept of steepest-descent method is used in 
BPA to reach a global minimum. The number of 
layers are decided initially. The number of nodes in 
the hidden layers are decided. It uses all the 3 layers 
(input, hidden and output). Flow-chart for BPA is 
shown in Figure 4. 

Steps Involved In Training Bpa 

Forward Propagation 

The hidden layer connections of the network are 
initialized with weights. 

The inputs and outputs of a pattern are presented 
to the network. 

The output of each node in the successive layers is 
calculated by using equation (1). 



O, 



(output of a node) 



=l/(l+exp(-EwijxO) 



(1) 
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(2). 



Figure 4 Flow-chart of BPA 
For each pattern, error is calculated using equation 



E(p) = (l/2) X(d(p)-o(p)) 2 
Reverse Propagation 



(2) 



For the nodes, the error in the output layer is 
calculated using equation (3). 
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5(outputlayer)=0(l-o)(d-o) (3) 

The weights between output layer and hidden 
layer are updated by using equation (4). 

W(„+l) = W(„) + T|5(output layer) 0(hidden layer) (4) 

The error for the nodes in the hidden layer is 
calculated by using equation (5). 

5(hidden layer) — 0( 1 -o)X5(output layer) W( U pdated weights between 
hidden & output layer) (5) 

The weights between hidden and input layer are 
updated by using equation (6). 

W( n +1) = W( n ) + T|5(hidden layer) 0(input layer) (6) 

The above steps complete one weight updation. 



The above steps are followed for the second 
pattern for subsequent weight updation. When all the 
training patterns are presented, a cycle of iteration or 
epoch is completed. The errors of all the training 
patterns are calculated and displayed on the monitor 
as the MSE. 

E(mse) = Z E( P ) (7) 

II. EXPERIMENTAL WORK 

Color retinal images obtained from Benajami 
Hospital, Nagarkoil(India). According to the 
National Screening Committee standards, all the 
images are obtained using a Canon CR6-45 Non- 
Mydriatic (CR6-45NM) retinal camera. A modified 
digital back unit (Sony PowerHAD 3CCD color 
video camera and Canon CR-TA) is connected to 
the fundus camera to convert the fundus image into 
a digital image. The digital images are processed 
with an image grabber and saved on the hard drive 
of a Windows 2000 based Pentium -IV. 

The Sample images of normal (Figure 5) and 
abnormal types (Figure 6)are given. 
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Fig. 5 Normal fundus images 

Figure 5 shows sample images of eyes in good 
condition 
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Fig. 6 Hard exudates 
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Figure 6 shows sample images of eyes with hard 
exudates. 



Ill IMPLEMENTATION OF FUZZY LOGIC AND 
BPA 

The Fuzzy logic and BPA is trained with the data 
given in Table 1 . Each row made up of 4 variables. A 
labeling is given in the last column. Eighteen patterns 
are considered for training. These 17patterns are 
taken from the segmented image. Additional hard 
exudate images can also be considered from which 
additional patterns can be obtained. A topology of 
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3 (nodes in the input layer) X 5 (nodes in the hidden layer) x 1 (node in the 

output layer) is used for training BP A. The final weights 
obtained are used for classification of the segmented 
exudates from the noise present in the segmented 
image. 



Table 1 Data for training Fuzzy logic and BPA 



Training Inputs 



55 
59 
61 
64 
69 
75 
78 
89 
100 
104 
109 
139 
165 
167 
214 
251 
5108 



Fille 
dAre 



59 
59 
61 
64 
70 
75 
80 
91 
101 
108 
109 
139 
165 
180 
219 
251 
5117 



Sol 
idit 

y 



0.6111 
0.7024 
0.5980 
0.5161 
0.6970 
0.7732 
0.6393 
0.5973 
0.5587 
0.7324 
0.8790 
0.7128 
0.9016 
0.5860 
0.7431 
0.6452 
0.8913 



Orien 
tation 



-16.5837 
-29.5294 
43.1644 
-4.1202 
20.1090 
7.9202 
82.4571 
84.0033 
-39.8444 
-12.7048 
42.4872 
81.1306 
45.6726 
55.3490 
40.4485 
80.2676 
91.3917 



Tar 
get 
out 
put 

s 



Lab 
elin 

g 



IV RESULTS AND DISCUSSION 

For template matching and comparison purposes, 
representative exudates are isolated from the original 
retinopathy images in order to create exudates 
templates which are presented in Figure7. 



* * 

*> 
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Fig. 7 (b) Segmented hard exudates 

Figure 7a shows the sample templates out of fifty 
templates collected. Each template has varied 
scattering of the exudates. Figure 7b shows, the 
segmented exudates. The black indicates the 
background of the image and the white shows the 
hard exudates. Statistical features for the hard 
exudates templates are found. The statistical features 
considered are 'Convex Area', 'Solidity', 'Orientation' 
and 'Filled Area'. 



t 



• 4 



Fig. 7 (a) Sample Hard Exudates 



Fig. 8a A portion of the original true color image 

Figure 8 a presents a portion of the original 
diabetic retinopathy image in true color. The plane- 1 
information of the original image is shown in Figure 
8b. The plane-2 (Figure 8c) and plane-3(Figure 8d) 
are shown. Identification of exudates is done using 
plane-2 information. 
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Fig. 8b Plane 1 of the image in Figure 8a 




Fig. 8c Plane 2 of the image in Figure 8a 




Fig. 8e Plane 2 of the image segmented 

The hard exudates are found scattered in the 
retinopathy image. The segmented image shows more 
noise. Figure 9a presents 9 pixel values summed versus the 
window number during scanning the image to be 
segmented. The average summed number is above 1500 
which is an indication of slight white background 
appearance as can be seen from Figure 8c (plane 2). 
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Fig. 9a Total pixel values 



x 10 



■1 



Fig.8d Plane 3 of the image in Figure 8a 
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Fig. 9b Mean of each window 

The mean (Figure 9b) is shown. The property of 
imfeature is applied to the segmented image. The 
area of the labeled objects in the segmented image is 
obtained. The optic disc in the image is removed by 
using a threshold. If the area of an object is greater 
than a value of 500, then it is treated as optic disc. 
Using the boundingbox concept, this object is filled 
with black. Hence the remaining objects could be 
either the noise or the exudates. Figure 10 shows the 
eye disc removed by applying statistical features. 




Fig. 10 Eyedisc removed 

The sample outputs of statistical area of the imfeature 
is shown in Table 1 . 

V CONCLUSION 

The main focus of this work is on segmenting the 
diabetic retinopathy image and classifies the 
exudates. Segmentation is done and classification of 



the exudates is done using Fuzzy logic and back 
propagation algorithm (BPA) network. The 
performance classification of exudates by using BPA 
is better. The proposed Fuzzy logic and BPA 
classifies the segmented information of the image 
into hard exudates or not. 

1. All the fundus images in this work have to be 
transformed to a standard template image condition. 
This corrects in the illumination effect on the images. 

2. Only when the fundus image is taken with 
good quality, detection of exudates is more accurate. 
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